Hive編程指南_學習筆記01

阿新 • • 發佈：2017-06-22

one 復制 data- 模糊 mode table 查看 employ mod

第四章： HQl的數據定義
1：創建數據庫
create database financials;
create database if not exists financials;

2: 查看數據庫
show databases;
模糊查詢數據庫
show databases like ‘h.*‘ ;

3：創建數據庫改動數據庫的默認位置
create database financials localtion ‘/my/preferred/directory‘

4：添加數據庫的描寫敘述信息
create database financials comment ‘holds all financials tables‘
5: 顯示數據庫的描寫敘述的信息
describe database financials;
6：添加一些和相關屬性的鍵-值對屬性信息
create database financials
with dbproperties (‘create‘= ‘Mark Moneybags‘, ‘data‘=‘2012-12-12‘);
describe database extended financials;

7:沒有命令提示讓用戶查看當前所在的是那個數據庫。能夠反復使用use
use financials。
能夠通過設置一個屬性值來在提示符裏面顯示當前所在的數據庫
set hive.cli.print.current.db = true;
set hive.cli.print.current.db= false;

8:刪除數據庫
drop database if exists financials;
Hive是不同意刪除一個包括表的數據庫。
當時假設加上keyword： cascade。就能夠了，hive自己主動刪除數據庫中的表
drop database if exists financials cascade;

9：改動數據庫，設置dbproperties鍵值對屬性值
alert database financials set dbproperties(‘edited-by‘=‘joe dba‘);

10:創建表：
create table if not exists employees (
name string comment ‘employee name‘,
salary float comment ‘employee salary ‘,
subordinates array<string> comment ‘employee name of subordinates ‘ ,
deductions Map<string,FLOAT>,
address struct<street:string,city:string,state:String,zip：int>
)
comment ‘ description of the table ‘
tblproperties (‘creater‘= ‘me‘, ‘created_at‘=‘2012-12-12‘);
location ‘/user/hive/warehouse/mydb.db/employees‘

-- tblproperties 的主要作用是：按鍵-值對的格式為表添加額外的文檔說明

11: 列舉某個表的tblproperties 屬性信息
show tblproperties employees;

12：拷貝表
create table if not exists mydb.employees2 like mydb.employees2

13：選擇數據庫
use mydb
顯示表

show tables;
show tables IN mydb;
14：查看這個表的具體結果信息
describe extended mydb.employees
使用formatted keyword取代 extended
describe formatted mydb.employees

15：管理表：內部表：刪除表時，會刪除這個表的數據
創建一個外部表：其能夠讀取全部位於/data/stocks文件夾下的以逗號切割的數據
create external table if not exists stocks(
exchange string,
symbol string,
ymd String,
price_open float,
price_hight float,
price_low float,
price_close float,
volume int,price_adj_close float)
row format delimited fields terminated by ‘,‘
location ‘/data/stocks‘

16：查看表是否是管理表還是外部表
describe extended tablename
輸出信息：
tableType.managed_table--管理表
tableType.external_table--外部表

-- 復制表但不會復制數據
create table if not exists mydb.employees3(新表)
like mydb.employees2(原表) location ‘/data/stocks‘

17：創建分區表
create table employees (
name string,
salary float,
subordinates array<string>,
deductions Map<string,FLOAT>,
address struct<street:string,city:string,state:String,zip：int>
)
partitioned by (country String,state string);

分區自段：
country String,state string 和普通字段一樣。相當於索引字段。
依據分區字段查詢，提交效率。提高查詢性能

18： set hive.mapred.mode=strict;
假設對分區表進行查詢而where子句沒有加分區過濾的話，
將會禁止提交這個任務。
能夠設置為：nostrict

19：查看表中存在的全部分區
show partitions employees;

20：查看是否存儲某個特定分區鍵的分區的話
show partitions employees partition(country=‘US‘);
describe extended employees 命令也會顯示分區鍵

管理大型生產數據集最常見的情況：使用外部分區表
21：在管理表中用戶能夠通過加載數據的方式創建分區：
load data local inpath ‘/home/hive/California-employees‘
INTO table employees
partition(country=‘US‘,state=‘CA‘);

hive 將會創建這個分區相應的文件夾..../employees/country=US/state=CA

22:創建外部分區表

create table if not exists log_messages (
hms int,
severity string,
server string,
process_id int,
message string

)
partitioned by (year int,month int,day int)
row format delimited fields terminated by ‘\t‘

1:order by 會對輸入做全局排序

2: sort能夠控制每一個reduce產生的文件都是排序。再對多個排序的好的文件做二次歸並排序。

sort by 特點例如以下：
1) . sort by 基本受hive.mapred.mode是否為strict、nonstrict的影響，但若有分區須要指定分區。
2). sort by 的數據在同一個reduce中數據是按指定字段排序。
3). sort by 能夠指定運行的reduce個數，如：set mapred.reduce.tasks=5 ,對輸出的數據再運行歸並排序。即能夠得到所有結果。

結果說明：嚴格模式下，sort by 不指定limit 數，能夠正常運行。

sort by 受hive.mapred.mode=sctrict 的影響較小。

3:distribute by
distribute by 是控制在map端怎樣拆分給reduce端。

依據distribute by 後面的列及reduce個數進行數據分發，默認採用hash算法。distribute能夠使用length方法會依據string類型的長度劃分到不同的reduce中。終於輸出到不同的文件裏。 length 是內建函數，也能夠指定其它的函數或這使用自己定義函數。

4: cluster by

cluster by 除了distribute by 的功能外，還會對該字段進行排序，所以cluster by = distribute by +sort by

Hive編程指南_學習筆記01

one 復制 data- 模糊 mode table 查看 employ mod 第四章： HQl的數據定義 1：創建數據庫 create database financials; create database if not exists financial

Hive編程指南_學習筆記01

Hive編程指南_學習筆記01

函數響應式編程及ReactiveObjC學習筆記 (二)

C#可擴展編程之MEF學習筆記（一）：MEF簡介及簡單的Demo（轉）

C#可擴展編程之MEF學習筆記（三）：導出類的方法和屬性（轉）

《JavaScript面向對象編程指南》讀書筆記①

《Java並發編程實戰》學習筆記 - 第一部分

《Java並發編程實戰》學習筆記 - 第二部分

麻省理工公開課：線性代數_學習筆記01

Linux Unix shell 編程指南學習筆記（第四部分）

OpenGL編程指南（第九版） Tiangles 學習筆記

Linux C 編程學習筆記-01-程序和編程語言

andorid權威編程指南　筆記

安卓權威編程指南-筆記（第27章 broadcast intent）

安卓權威編程指南-筆記（第24章 Looper Handler 和 HandlerThread）

安卓權威編程指南-筆記（第29章定制視圖與觸摸事件）

計算機科學和PYTHON編程導論_筆記1開方算法

如何深入系統的學習一門編程語言——python自學筆記

《java並發編程實戰》讀書筆記5--任務執行， Executor框架

Spark SQL編程指南（Python）【轉】

《java並發編程實戰》讀書筆記6--取消與關閉

Hive編程指南_學習筆記01

相關推薦