讓分割槽表和資料產生關聯的三種方式
阿新 • • 發佈:2018-12-04
目錄
總結:
讓分割槽表和資料產生關聯的方式有三種:
①先在HDFS上建立分割槽的目錄,並上傳資料到該目錄,最後在使用alter table add partition
②先在HDFS上建立分割槽的目錄,並上傳資料到該目錄,最後使用msck repair table table_name
③先在HDFS上建立分割槽的目錄,在使用 load data local 路徑 into table table_name partition(欄位名=欄位值)
將資料上傳到分割槽目錄上,讓分割槽表和資料產生關聯的三種方式:資料準備:建立一個沒有分割槽的普通表stu_par
create table stu_par(id int,name string)
row format delimited
fields terminated by '\t';
方式一:上傳資料後新增分割槽alter add:
> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=09 > !sh hadoop fs -put /opt/module/hive/stu.txt /user/hive/warehouse/db_hive.db/stu_par/month=09 重複操作省略 > !sh hadoop fs -ls /user/hive/warehouse/db_hive.db/stu_par/ Found 4 items drwxr-xr-x - isea supergroup 0 2018-12-01 05:50 /user/hive/warehouse/db_hive.db/stu_par/month=09 drwxr-xr-x - isea supergroup 0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=10 drwxr-xr-x - isea supergroup 0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=11 drwxr-xr-x - isea supergroup 0 2018-12-01 04:30 /user/hive/warehouse/db_hive.db/stu_par/month=12 0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '09'; OK +-------------+---------------+----------------+--+ | stu_par.id | stu_par.name | stu_par.month | +-------------+---------------+----------------+--+ +-------------+---------------+----------------+--+ 此時的無法查詢到資料 0: jdbc:hive2://hadoop108:10000> alter table stu_par add partition(month = '09'); 0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '09'; OK +-------------+---------------+----------------+--+ | stu_par.id | stu_par.name | stu_par.month | +-------------+---------------+----------------+--+ | 1001 | zhangfei | 09 | | 1002 | liubei | 09 | | 1003 | guanyu | 09 | | 1004 | zhaoyun | 09 | | 1005 | caocao | 09 | | 1006 | zhouyu | 09 | +-------------+---------------+----------------+--+
方式二:上傳資料後修復msck:
1,在HDFS上建立08分割槽,並上傳資料 > !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=08 0: jdbc:hive2://hadoop108:10000> !sh hadoop fs -ls /user/hive/warehouse/db_hive.db/stu_par/ Found 5 items drwxr-xr-x - isea supergroup 0 2018-12-01 06:06 /user/hive/warehouse/db_hive.db/stu_par/month=08 drwxr-xr-x - isea supergroup 0 2018-12-01 05:54 /user/hive/warehouse/db_hive.db/stu_par/month=09 drwxr-xr-x - isea supergroup 0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=10 drwxr-xr-x - isea supergroup 0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=11 drwxr-xr-x - isea supergroup 0 2018-12-01 04:30 /user/hive/warehouse/db_hive.db/stu_par/month=12 > !sh hadoop fs -put /opt/module/hive/stu.txt /user/hive/warehouse/db_hive.db/stu_par/month=08 此時,在HDFS上有資料,但是該表中並沒有對應該資料的,所以還是查詢不到資料 0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '08'; OK +-------------+---------------+----------------+--+ | stu_par.id | stu_par.name | stu_par.month | +-------------+---------------+----------------+--+ +-------------+---------------+----------------+--+ No rows selected (0.091 seconds) 此時,我使用msck 修復一下這個分割槽表,這個修復命令會自動呼叫上面修改表的命令完成元資料的引入 0: jdbc:hive2://hadoop108:10000> msck repair table stu_par; OK No rows affected (0.215 seconds) 0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '08'; OK +-------------+---------------+----------------+--+ | stu_par.id | stu_par.name | stu_par.month | +-------------+---------------+----------------+--+ | 1001 | zhangfei | 08 | | 1002 | liubei | 08 | | 1003 | guanyu | 08 | | 1004 | zhaoyun | 08 | | 1005 | caocao | 08 | | 1006 | zhouyu | 08 | +-------------+---------------+----------------+--+
方式三:建立資料夾後load資料到分割槽:
1,現在HDFS建立08分割槽,並上傳資料
> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=07
2,此時該表中,沒有對應的元資料資訊,也沒有對應的資料,所以該表還是空的
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '07';
+-------------+---------------+----------------+--+
| stu_par.id | stu_par.name | stu_par.month |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+
此時,我們在上傳資料的同時,為該表建立07分割槽的元資料
> load data local inpath '/opt/module/hive/stu.txt' into table stu_par partition(month = '07');
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '07';
OK
+-------------+---------------+----------------+--+
| stu_par.id | stu_par.name | stu_par.month |
+-------------+---------------+----------------+--+
| 1001 | zhangfei | 07 |
| 1002 | liubei | 07 |
| 1003 | guanyu | 07 |
| 1004 | zhaoyun | 07 |
| 1005 | caocao | 07 |
| 1006 | zhouyu | 07 |
+-------------+---------------+----------------+--+
既把資料上傳到指定的分割槽對應的資料夾下,又能寫入元資料。
總結:
讓分割槽表和資料產生關聯的方式有三種:
①先在HDFS上建立分割槽的目錄,並上傳資料到該目錄,最後在使用alter table add partition
②先在HDFS上建立分割槽的目錄,並上傳資料到該目錄,最後使用msck repair table table_name
③先在HDFS上建立分割槽的目錄,在使用 load data local 路徑 into table table_name partition(欄位名=欄位值)