1. 程式人生 > >HIVE基礎操作(命令,表,資料匯出匯入等)--持續更新

HIVE基礎操作(命令,表,資料匯出匯入等)--持續更新

1.show databases;
2.show tables;
3.show tables in 資料庫名;(不在對應資料庫下可查到想要的表名)
4.show tables in hive 'tom*'; 查詢在hive資料庫下,tom開頭的表名。
5.desc extended tablename; 
可以查看錶詳細資訊,其中tableType=MANAGED_TABLE或EXTENDED_TATBLE 看出是內部表還是外部表
6.資料庫的增刪改
(1)建立
create database if not exists hive;
使用if not exists 可以避免丟擲錯誤資訊。
(2)刪除資料庫
drop database if exists hive;
if exists 是可選的,如果加了這個子句,避免hive不存在丟擲的異常。

預設情況,hive是不允許刪除一個帶有表的資料庫的。使用者可以先清空資料庫表,再刪除庫,或者使用關鍵字cascade:
drop database if exists hive cascade;

(3)修改資料庫
可以修改資料庫的dbproperties,設定鍵值對屬性值,但是其他元資料資訊是不能修改的。如:
alter database hive set dbproperties(‘edited-by’=’wang’);
alter database mytest set dbproperties('creator'='wangdd');

7.建立表
(1)基本語法
CREATE [TEMPORARY(臨時表)] [EXTERNAL(外部表,如果不加該關鍵字就是建立內部表)] TABLE [IF NOT EXISTS] [db_name.]table_name   
[(col_name data_type [COMMENT col_comment](列的註釋), ... [constraint_specification])]
[COMMENT table_comment](表的註釋)

  [PARTITIONED BY (分割槽)(col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY(分佈) (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS(分桶)]
  [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 分隔符的指定
   [STORED AS file_format] 資料儲存格式
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]真實資料存放的位置
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables) 


create table hive.person(
id int,
name string,
likes array<string>,
desc map<string,string>
desc struct<city:string,area:string,streetID:int>

)
ROW FORMAT  DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
LINES TERMINATED BY '\n'
NULL DEFINED AS '@' --在顯示的時候就是null了
STORED AS TEXTFILE;

(2)複製一個表結構(不包含資料)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
  CREATE TABLE like_student LIKE student;
(3)通過查詢來建立另外一張表
CREATE TABLE new_key_value_store
   ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
   STORED AS RCFile
   AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;

  CREATE TABLE s_person1
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '#'
  COLLECTION ITEMS TERMINATED BY '^'
  MAP KEYS TERMINATED BY '_'
  NULL DEFINED AS '@'
  AS SELECT * FROM person1;

(4)分割槽表
create table hive.tomcat_log(
id string,
page string,
status int,
traffic int
)
partitioned by (year string,month string,day string)
ROW FORMAT  DELIMITED FIELDS TERMINATED BY ',' 
STORED AS TEXTFILE; 

(5)分桶表
create table clus2(
cc int)
CLUSTERED BY (cc)
SORTED by (cc)
into 3 buckets; 

8.資料的匯入
(1)LOAD DATA 
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE](覆蓋,不寫就是追加) INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

在本地匯入(Linux系統下)複製的過程
load data local inpath '/home/wangfutai/a/1.txt'  INTO TABLE hive.union_test;

在hdfs匯入(hdfs系統下)剪下的過程,原來目錄資料沒了
追加:
load data inpath '/user/wangfutai/hive/warehouse/hive.db/st/st.txt'  INTO TABLE hive.union_test;
覆蓋:
load data inpath '/user/wangfutai/hive/warehouse/hive.db/st/1.txt' OVERWRITE INTO TABLE hive.union_test;
(2)分割槽表的匯入
靜態模式:
load data local inpath '/home/wangfutai/a/2.txt' OVERWRITE  INTO TABLE hive.tomcat_log PARTITION(year='2017',month='11',day='5');

load data  inpath '/user/wangfutai/mr/ETLOutPut16/part-r-00001' OVERWRITE  INTO TABLE hive.tomcatelog PARTITION(days='20170531');

動態模式:
覆蓋模式:只會覆蓋相同分割槽資料,其他分割槽不受影響·
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
追加模式:
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
例子:
INSERT OVERWRITE TABLE  dynamic_human2 PARTITION (sexs) select * from human1;

以下插入,需要human3的列比dynamic_human3 少一列
INSERT OVERWRITE TABLE  dynamic_human3 PARTITION (sexs=‘nan’) select * from human3;


9.資料的匯出
(1)通過建表的方式
CREATE TABLE new_key_value_store
   ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
   STORED AS RCFile
   AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;

CREATE TABLE s_person1
ROW FORMAT DELIMITED FIELDS TERMINATED BY '#'
COLLECTION ITEMS TERMINATED BY '^'
MAP KEYS TERMINATED BY '_'
NULL DEFINED AS '@'
AS SELECT * FROM person1;

(2)通過insert 
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] 
SELECT ... FROM ...
  儲存到本地
  INSERT OVERWRITE LOCAL DIRECTORY
'/home/wangfutai/a/partition_data'
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '#'
  NULL DEFINED AS '@'
  SELECT * FROM st;

 儲存到hdfs中  會覆蓋掉原來目錄
  INSERT OVERWRITE  DIRECTORY
 '/user/candle/hive_data/person1_data'
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '#'
  COLLECTION ITEMS TERMINATED BY '^'
  MAP KEYS TERMINATED BY '_'
  NULL DEFINED AS '@'
  SELECT * FROM person1;

(3)通過查詢插入表中,指定列作為分割槽
覆蓋模式:只會覆蓋相同分割槽資料,其他分割槽不受影響·
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
追加模式:
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
覆蓋:
INSERT OVERWRITE TABLE dynamic_human1 PARTITION(sex) SELECT * FROM human1;
追加:
INSERT INTO TABLE dynamic_human1 PARTITION(sex) SELECT * FROM human2;

也可以在插入的時候 指定分割槽列名,但是指定了分割槽列名,那麼human3就必須沒有sex這列,因為sex分割槽已經指定了

INSERT OVERWRITE TABLE dynamic_human1 PARTITION(sex='aaa') SELECT * FROM human1;