1. 程式人生 > >hive中簡單介紹分割槽表(partition table),含動態分割槽(dynamic partition)與靜態分割槽(static partition)

hive中簡單介紹分割槽表(partition table),含動態分割槽(dynamic partition)與靜態分割槽(static partition)

hive> insert overwrite table partition_test partition(stat_date='20110527',province='liaoning') select member_id,name from partition_test_input;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = a6_20171104201912_291312cf-da20-4b67-a746-1889b297f0bd
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1509763925736_0009, Tracking URL = http://localhost:8088/proxy/application_1509763925736_0009/
Kill Command = /Users/a6/Applications/hadoop-2.6.5/bin/hadoop job  -kill job_1509763925736_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-11-04 20:19:20,577 Stage-1 map = 0%,  reduce = 0%
2017-11-04 20:19:26,869 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1509763925736_0009
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_test/stat_date=20110527/province=liaoning/.hive-staging_hive_2017-11-04_20-19-12_640_370650644218657006-1/-ext-10000
Loading data to table yyz_workdb.partition_test partition (stat_date=20110527, province=liaoning)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   HDFS Read: 4507 HDFS Write: 185 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 15.737 seconds
hive> select * from partition_test;
OK
1	liujiannan	20110526	liaoning
1	liujiannan	20110527	liaoning
2	wangchaoqun	20110527	liaoning
3	xuhongxing	20110527	liaoning
4	zhudaoyong	20110527	liaoning
5	zhouchengyu	20110527	liaoning
5	zhouchengyu	20110728	heilongjiang
4	zhudaoyong	20110728	henan
3	xuhongxing	20110728	sichuan
Time taken: 0.104 seconds, Fetched: 9 row(s)
hive>
可以看到在partition_test_input中的5條資料有著不同的stat_date和province,但是在插入到partition(stat_date='20110527',province='liaoning')這個分割槽後,5條資料的stat_date和province都變成相同的了,因為這兩列的資料是根據資料夾的名字讀取來的,而不是實際從資料檔案中讀取來的: