1. 程式人生 > >讓分割槽表和資料產生關聯的三種方式

讓分割槽表和資料產生關聯的三種方式

目錄

總結:

方式一:上傳資料後新增分割槽alter add:

方式二:上傳資料後修復msck:

方式三:建立資料夾後load資料到分割槽:

總結:


總結:

讓分割槽表和資料產生關聯的方式有三種:

①先在HDFS上建立分割槽的目錄,並上傳資料到該目錄,最後在使用alter table add partition

②先在HDFS上建立分割槽的目錄,並上傳資料到該目錄,最後使用msck repair table table_name

③先在HDFS上建立分割槽的目錄,在使用 load data local 路徑 into table table_name partition(欄位名=欄位值) 

將資料上傳到分割槽目錄上,讓分割槽表和資料產生關聯的三種方式:資料準備:建立一個沒有分割槽的普通表stu_par
create table stu_par(id int,name string)
row format delimited
fields terminated by '\t';

方式一:上傳資料後新增分割槽alter add:

> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=09
> !sh hadoop fs -put /opt/module/hive/stu.txt /user/hive/warehouse/db_hive.db/stu_par/month=09
重複操作省略


> !sh hadoop fs -ls /user/hive/warehouse/db_hive.db/stu_par/
Found 4 items
drwxr-xr-x   - isea supergroup      0 2018-12-01 05:50 /user/hive/warehouse/db_hive.db/stu_par/month=09
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=10
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=11
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:30 /user/hive/warehouse/db_hive.db/stu_par/month=12

0: jdbc:hive2://hadoop108:10000> select * from stu_par where month  = '09';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+

此時的無法查詢到資料

0: jdbc:hive2://hadoop108:10000> alter table stu_par add partition(month = '09');

0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '09';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 09             |
| 1002        | liubei        | 09             |
| 1003        | guanyu        | 09             |
| 1004        | zhaoyun       | 09             |
| 1005        | caocao        | 09             |
| 1006        | zhouyu        | 09             |
+-------------+---------------+----------------+--+

方式二:上傳資料後修復msck:

1,在HDFS上建立08分割槽,並上傳資料
> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=08
0: jdbc:hive2://hadoop108:10000> !sh hadoop fs -ls /user/hive/warehouse/db_hive.db/stu_par/
Found 5 items
drwxr-xr-x   - isea supergroup      0 2018-12-01 06:06 /user/hive/warehouse/db_hive.db/stu_par/month=08
drwxr-xr-x   - isea supergroup      0 2018-12-01 05:54 /user/hive/warehouse/db_hive.db/stu_par/month=09
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=10
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:34 /user/hive/warehouse/db_hive.db/stu_par/month=11
drwxr-xr-x   - isea supergroup      0 2018-12-01 04:30 /user/hive/warehouse/db_hive.db/stu_par/month=12
> !sh hadoop fs -put /opt/module/hive/stu.txt /user/hive/warehouse/db_hive.db/stu_par/month=08

此時,在HDFS上有資料,但是該表中並沒有對應該資料的,所以還是查詢不到資料
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '08';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+
No rows selected (0.091 seconds)

此時,我使用msck 修復一下這個分割槽表,這個修復命令會自動呼叫上面修改表的命令完成元資料的引入
0: jdbc:hive2://hadoop108:10000> msck repair table stu_par;
OK
No rows affected (0.215 seconds)
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '08';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 08             |
| 1002        | liubei        | 08             |
| 1003        | guanyu        | 08             |
| 1004        | zhaoyun       | 08             |
| 1005        | caocao        | 08             |
| 1006        | zhouyu        | 08             |
+-------------+---------------+----------------+--+

方式三:建立資料夾後load資料到分割槽:

1,現在HDFS建立08分割槽,並上傳資料
> !sh hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/stu_par/month=07

2,此時該表中,沒有對應的元資料資訊,也沒有對應的資料,所以該表還是空的
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '07';
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
+-------------+---------------+----------------+--+

此時,我們在上傳資料的同時,為該表建立07分割槽的元資料
> load data local inpath '/opt/module/hive/stu.txt' into table stu_par partition(month = '07');
0: jdbc:hive2://hadoop108:10000> select * from stu_par where month = '07';
OK
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | zhangfei      | 07             |
| 1002        | liubei        | 07             |
| 1003        | guanyu        | 07             |
| 1004        | zhaoyun       | 07             |
| 1005        | caocao        | 07             |
| 1006        | zhouyu        | 07             |
+-------------+---------------+----------------+--+
既把資料上傳到指定的分割槽對應的資料夾下,又能寫入元資料。

總結:

讓分割槽表和資料產生關聯的方式有三種:

①先在HDFS上建立分割槽的目錄,並上傳資料到該目錄,最後在使用alter table add partition

②先在HDFS上建立分割槽的目錄,並上傳資料到該目錄,最後使用msck repair table table_name

③先在HDFS上建立分割槽的目錄,在使用 load data local 路徑 into table table_name partition(欄位名=欄位值)