HIVE基礎操作(命令,表,資料匯出匯入等)--持續更新
阿新 • • 發佈:2019-01-14
1.show databases; 2.show tables; 3.show tables in 資料庫名;(不在對應資料庫下可查到想要的表名) 4.show tables in hive 'tom*'; 查詢在hive資料庫下,tom開頭的表名。 5.desc extended tablename; 可以查看錶詳細資訊,其中tableType=MANAGED_TABLE或EXTENDED_TATBLE 看出是內部表還是外部表 6.資料庫的增刪改 (1)建立 create database if not exists hive; 使用if not exists 可以避免丟擲錯誤資訊。 (2)刪除資料庫 drop database if exists hive; if exists 是可選的,如果加了這個子句,避免hive不存在丟擲的異常。 預設情況,hive是不允許刪除一個帶有表的資料庫的。使用者可以先清空資料庫表,再刪除庫,或者使用關鍵字cascade: drop database if exists hive cascade; (3)修改資料庫 可以修改資料庫的dbproperties,設定鍵值對屬性值,但是其他元資料資訊是不能修改的。如: alter database hive set dbproperties(‘edited-by’=’wang’); alter database mytest set dbproperties('creator'='wangdd'); 7.建立表 (1)基本語法 CREATE [TEMPORARY(臨時表)] [EXTERNAL(外部表,如果不加該關鍵字就是建立內部表)] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name data_type [COMMENT col_comment](列的註釋), ... [constraint_specification])] [COMMENT table_comment](表的註釋) [PARTITIONED BY (分割槽)(col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY(分佈) (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS(分桶)] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] 分隔符的指定 [STORED AS file_format] 資料儲存格式 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path]真實資料存放的位置 [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later) [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables) create table hive.person( id int, name string, likes array<string>, desc map<string,string> desc struct<city:string,area:string,streetID:int> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '-' LINES TERMINATED BY '\n' NULL DEFINED AS '@' --在顯示的時候就是null了 STORED AS TEXTFILE; (2)複製一個表結構(不包含資料) CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]; CREATE TABLE like_student LIKE student; (3)通過查詢來建立另外一張表 CREATE TABLE new_key_value_store ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" STORED AS RCFile AS SELECT (key % 1024) new_key, concat(key, value) key_value_pair FROM key_value_store SORT BY new_key, key_value_pair; CREATE TABLE s_person1 ROW FORMAT DELIMITED FIELDS TERMINATED BY '#' COLLECTION ITEMS TERMINATED BY '^' MAP KEYS TERMINATED BY '_' NULL DEFINED AS '@' AS SELECT * FROM person1; (4)分割槽表 create table hive.tomcat_log( id string, page string, status int, traffic int ) partitioned by (year string,month string,day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; (5)分桶表 create table clus2( cc int) CLUSTERED BY (cc) SORTED by (cc) into 3 buckets; 8.資料的匯入 (1)LOAD DATA LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE](覆蓋,不寫就是追加) INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] 在本地匯入(Linux系統下)複製的過程 load data local inpath '/home/wangfutai/a/1.txt' INTO TABLE hive.union_test; 在hdfs匯入(hdfs系統下)剪下的過程,原來目錄資料沒了 追加: load data inpath '/user/wangfutai/hive/warehouse/hive.db/st/st.txt' INTO TABLE hive.union_test; 覆蓋: load data inpath '/user/wangfutai/hive/warehouse/hive.db/st/1.txt' OVERWRITE INTO TABLE hive.union_test; (2)分割槽表的匯入 靜態模式: load data local inpath '/home/wangfutai/a/2.txt' OVERWRITE INTO TABLE hive.tomcat_log PARTITION(year='2017',month='11',day='5'); load data inpath '/user/wangfutai/mr/ETLOutPut16/part-r-00001' OVERWRITE INTO TABLE hive.tomcatelog PARTITION(days='20170531'); 動態模式: 覆蓋模式:只會覆蓋相同分割槽資料,其他分割槽不受影響· INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement; 追加模式: INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement; 例子: INSERT OVERWRITE TABLE dynamic_human2 PARTITION (sexs) select * from human1; 以下插入,需要human3的列比dynamic_human3 少一列 INSERT OVERWRITE TABLE dynamic_human3 PARTITION (sexs=‘nan’) select * from human3; 9.資料的匯出 (1)通過建表的方式 CREATE TABLE new_key_value_store ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" STORED AS RCFile AS SELECT (key % 1024) new_key, concat(key, value) key_value_pair FROM key_value_store SORT BY new_key, key_value_pair; CREATE TABLE s_person1 ROW FORMAT DELIMITED FIELDS TERMINATED BY '#' COLLECTION ITEMS TERMINATED BY '^' MAP KEYS TERMINATED BY '_' NULL DEFINED AS '@' AS SELECT * FROM person1; (2)通過insert INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format] SELECT ... FROM ... 儲存到本地 INSERT OVERWRITE LOCAL DIRECTORY '/home/wangfutai/a/partition_data' ROW FORMAT DELIMITED FIELDS TERMINATED BY '#' NULL DEFINED AS '@' SELECT * FROM st; 儲存到hdfs中 會覆蓋掉原來目錄 INSERT OVERWRITE DIRECTORY '/user/candle/hive_data/person1_data' ROW FORMAT DELIMITED FIELDS TERMINATED BY '#' COLLECTION ITEMS TERMINATED BY '^' MAP KEYS TERMINATED BY '_' NULL DEFINED AS '@' SELECT * FROM person1; (3)通過查詢插入表中,指定列作為分割槽 覆蓋模式:只會覆蓋相同分割槽資料,其他分割槽不受影響· INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement; 追加模式: INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement; 覆蓋: INSERT OVERWRITE TABLE dynamic_human1 PARTITION(sex) SELECT * FROM human1; 追加: INSERT INTO TABLE dynamic_human1 PARTITION(sex) SELECT * FROM human2; 也可以在插入的時候 指定分割槽列名,但是指定了分割槽列名,那麼human3就必須沒有sex這列,因為sex分割槽已經指定了 INSERT OVERWRITE TABLE dynamic_human1 PARTITION(sex='aaa') SELECT * FROM human1;