1. 程式人生 > >Hive分割槽表新增欄位+重刷歷史方法(避免舊分割槽新增欄位為NULL)

Hive分割槽表新增欄位+重刷歷史方法(避免舊分割槽新增欄位為NULL)

1-1.建立測試表

hive> CREATE  EXTERNAL TABLE table_for_test_add_column(
    >    original_column string COMMENT '原始資料'
    >  )
    >  COMMENT 'add_column的測試表'
    >  PARTITIONED BY (
    >  `dt` string COMMENT '日'
    >  )
    >  row format delimited fields terminated by '\t'
    >  STORED AS
textfile;
OK Time taken: 0.734 seconds

1-2.插入資料

hive> insert into table hive_test.table_for_test_add_column partition(dt='20180114')  select '測試資料_0114';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = xxx_20180114135008_cd63d301-89d5-461d-b0c0-be292b308e24
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set
to 0 since there's no reduce operator .... Loading data to table hive_test.table_for_test_add_column partition (dt=20180114) MapReduce Jobs Launched: Stage-Stage-1: Map: 1 HDFS Read: 4304 HDFS Write: 121 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 31.641 seconds

1-3.查詢現有資料

hive> select
* from hive_test.table_for_test_add_column;
OK 測試資料_0114 20180114 Time taken: 0.125 seconds, Fetched: 1 row(s)

1-4.新增列(alter table add column)

ALTER TABLE table_name 
  [PARTITION partition_spec]                 -- (Note: Hive 0.14.0 and later)
  ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
  [CASCADE|RESTRICT]                         -- (Note: Hive 1.1.0 and later)

ADD COLUMNS lets you add new columns to the end of the existing columns but before the partition columns. This is supported for Avro backed tables as well, for Hive 0.14 and later.

新增一列added_column 【備註

DDL語句最後新增CASCADE

否則新增的列在舊分割槽中不可見,查詢資料時為NULL,重新刷資料時仍為NULL
相似問題見 hive分割槽表增加欄位新增欄位值為空的bug

hive> alter table hive_test.table_for_test_add_column add columns (added_column string COMMENT '新新增的列') CASCADE;
OK
Time taken: 0.236 seconds

檢視建表語句,已新增一列

hive> show create table  hive_test.table_for_test_add_column ;
OK
CREATE EXTERNAL TABLE `hive_test.table_for_test_add_column`(
  `original_column` string COMMENT '原始資料',
  `added_column` string COMMENT '新新增的列')
COMMENT 'add_column的測試表'
PARTITIONED BY (
  `dt` string COMMENT '日')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim'='\t',
  'serialization.format'='\t')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/hive_test.db/table_for_test_add_column'
TBLPROPERTIES (
  'last_modified_by'='xxx',
  'last_modified_time'='1515909925',
  'transient_lastDdlTime'='1515909925')
Time taken: 0.125 seconds, Fetched: 21 row(s)

檢視資料

hive> select * from hive_test.table_for_test_add_column;
OK
測試資料_0114   NULL    20180114
Time taken: 0.121 seconds, Fetched: 1 row(s)

1-5.新增資料

hive>  insert into table hive_test.table_for_test_add_column partition(dt='20180115')  select '測試資料_0115', '新增列資料_0115';

校驗資料

hive> select * from hive_test.table_for_test_add_column where dt >= 20180114;
OK
測試資料_0114   NULL    20180114
測試資料_0115   新增列資料_0115  20180115
Time taken: 0.453 seconds, Fetched: 2 row(s)

1-6.重刷舊分割槽資料

hive>  insert overwrite table hive_test.table_for_test_add_column partition(dt='20180114')  select original_column, '新增列資料_0114' from hive_test.table_for_ted_column where dt = '20180114';
OK
測試資料_0114   新增列資料_0114  20180114
測試資料_0115   新增列資料_0115  20180115
Time taken: 0.073 seconds, Fetched: 2 row(s)

1-7.Hive 修改欄位

備註

修改欄位型別有諸多限制,要格外注意,可能導致寫入成功但查詢失敗

Change Column Name/Type/Position/Comment

ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type

[COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
This command will allow users to change a column’s name, data type, comment, or position, or an arbitrary combination of them. The PARTITION clause is available in Hive 0.14.0 and later; see Upgrading Pre-Hive 0.13.0 Decimal Columns for usage. A patch for Hive 0.13 is also available (see HIVE-7971).

The CASCADE|RESTRICT clause is available in Hive 1.1.0. ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table’s metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column change only to table metadata.

ALTER TABLE CHANGE COLUMN CASCADE clause will override the table partition’s column metadata regardless of the table or partition’s protection mode. Use with discretion.

The column change command will only modify Hive’s metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition.

Example:

CREATE TABLE test_change (a int, b int, c int);

// First change column a's name to a1.
ALTER TABLE test_change CHANGE a a1 INT;

// Next change column a1's name to a2, its data type to string, and put it after column b.
ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
// The new table's structure is:  b int, a2 string, c int.

// Then change column c's name to c1, and put it as the first column.
ALTER TABLE test_change CHANGE c c1 INT FIRST;
// The new table's structure is:  c1 int, b int, a2 string.

// Add a comment to column a1
ALTER TABLE test_change CHANGE a1 a1 INT COMMENT 'this is column a1';

參考