1. 程式人生 > >Hive 載入HDFS資料建表, 掛載分割槽遇到問題及解決方法

Hive 載入HDFS資料建表, 掛載分割槽遇到問題及解決方法

1.建立臨時表:

CREATE EXTERNAL TABLE  IF NOT EXISTS tmp.tmp_tb_jinritoutiao_log 
(
content string  COMMENT 'json內容格式'
)
COMMENT '今日頭條視訊內容'
PARTITIONED BY (`day` string)
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/datastream/portal/jinritoutiao/video/';

2.載入HDFS資料

alter table tmp.tmp_tb_jinritoutiao_log add partition(day='20180810') location '/data/jinritoutiao/video/2018-08-10';

問題: 第一次載入時報錯:

ValidationFailureSemanticException table is not partitioned but partition spec exists

意思是建的表不是分割槽表, 但明明加了day的分割槽,不知為何; 嘗試很多次, 最終給day加了引號, 才解決問題..

PARTITIONED BY (`day` string)

3.將已有的資料新增到對應分割槽當中

alter table tmp.tmp_tb_jinritoutiao_log add partition(day='20180810') location '/datastream/portal/jinritoutiao/video/2018-08-10';

4.根據需求建立新表, 並將log中的一列解析拆分, 拆入新表當中

CREATE EXTERNAL  TABLE  IF NOT EXISTS tmp.tmp_jinritoutiao_video 
(
id string comment''
, class string comment'', userId string comment'') partitioned by (day string comment '分割槽欄位') STORED AS ORC location '/user/portal/tmp_jinritoutiao_video';
insert overwrite table tmp.tmp_jinritoutiao_video partition (day='20180810') select get_json_object(content,'$.id') as id, get_json_object(content,'$.class') as class, get_json_object(userId,'$.class') as user_id from tmp.tmp_tb_jinritoutiao_log where day='20180810' limit 10

5.done