1. 程式人生 > >Hive中表的資料匯入(五種方式)

Hive中表的資料匯入(五種方式)

目錄

總結:

load:

insert:

建立表並載入資料(As Select):

location:

import:

總結:


總結:

hive中一共有五種資料匯出的方式:

①:load data方式,如果路徑是local是追加,若為HDFS則為是覆蓋

②:insert into [ value() , select ]

③:as , like ;as會獲得資料,like只會獲得表結構

④:location:先網HDFS上上傳資料,在把該檔案所在的資料夾的路徑通過location的方式指定給表

⑤:import:匯入export出來的資料

load:

> load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,…)];
(1)load data:表示載入資料
(2)local:表示從本地載入資料到hive表;否則從HDFS載入資料到hive表
(3)inpath:表示載入資料的路徑
(4)overwrite:表示覆蓋表中已有資料,否則表示追加
(5)into table:表示載入到哪張表
(6)student:表示具體的表
(7)partition:表示上傳到指定分割槽
1,建立一張表,並從本地向表中裝載資料
create table if not exists stu4(id int,name string)
row format delimited
fields terminated by '\t';


> select * from stu4;
+----------+------------+--+
| stu4.id  | stu4.name  |
+----------+------------+--+
+----------+------------+--+
> load data local inpath '/opt/module/hive/stu.txt' into table stu4;

> select * from stu4;
+----------+------------+--+
| stu4.id  | stu4.name  |
+----------+------------+--+
| 1001     | zhangfei   |
| 1002     | liubei     |
| 1003     | guanyu     |
| 1004     | zhaoyun    |
| 1005     | caocao     |
| 1006     | zhouyu     |
+----------+------------+--+

2,建立一張表,並從HDFS上向表中裝載資料:
create table if not exists stu5(id int,name string)
row format delimited
fields terminated by '\t';
> !sh hadoop fs -put /opt/module/hive/stu.txt /stu.txt
> select * from stu5;
+----------+------------+--+
| stu5.id  | stu5.name  |
+----------+------------+--+
+----------+------------+--+
> load data inpath '/stu.txt' into table stu5;
> select * from stu5;
+----------+------------+--+
| stu5.id  | stu5.name  |
+----------+------------+--+
| 1001     | zhangfei   |
| 1002     | liubei     |
| 1003     | guanyu     |
| 1004     | zhaoyun    |
| 1005     | caocao     |
| 1006     | zhouyu     |
+----------+------------+--+

3,載入資料覆蓋表中已有的資料:
> select * from stu5;
+----------+------------+--+
| stu5.id  | stu5.name  |
+----------+------------+--+
| 1001     | zhangfei   |
| 1002     | liubei     |
| 1003     | guanyu     |
| 1004     | zhaoyun    |
| 1005     | caocao     |
| 1006     | zhouyu     |
+----------+------------+--+

> load data local inpath '/opt/module/hive/stu2.txt' overwrite  into table stu5;
> select * from stu5;
+----------+------------+--+
| stu5.id  | stu5.name  |
+----------+------------+--+
| 1001     | zhangfei   |
| 1002     | liubei     |
| 1003     | guanyu     |
+----------+------------+--+

insert:

1,建立一張分割槽表:insert進一些資料
> create table stu6(id int,name string)
partitioned by (month string)
row format delimited
fields terminated by '\t';

> insert into table stu6 partition(month = '12') values(1001,'zhangfei'),(1002,'liubei');

0: jdbc:hive2://hadoop108:10000> select * from stu6;
+----------+------------+-------------+--+
| stu6.id  | stu6.name  | stu6.month  |
+----------+------------+-------------+--+
| 1001     | zhangfei   | 12          |
| 1002     | liubei     | 12          |
+----------+------------+-------------+--+

2,根據select的內容插入資料:
0: jdbc:hive2://hadoop108:10000> select * from stu6;
+----------+------------+-------------+--+
| stu6.id  | stu6.name  | stu6.month  |
+----------+------------+-------------+--+
| 1001     | zhangfei   | 12          |
| 1002     | liubei     | 12          |
+----------+------------+-------------+--+

> insert overwrite table stu6 partition(month = '12') select id,name from stu_par1 where month = '12';
0: jdbc:hive2://hadoop108:10000> select * from stu6;
+----------+------------+-------------+--+
| stu6.id  | stu6.name  | stu6.month  |
+----------+------------+-------------+--+
| 1001     | zhangfei   | 12          |
| 1002     | liubei     | 12          |
| 1003     | guanyu     | 12          |
| 1004     | zhaoyun    | 12          |
| 1005     | caocao     | 12          |
| 1006     | zhouyu     | 12          |
+----------+------------+-------------+--+
overwrite 對原來的資料進行了覆蓋:

3,多表插入模式:
from stu_par1
insert overwrite table stu6 partition(month = '11')
select id,name where month = '11'
insert overwrite table stu6 partition(month = '10')
select id,name where month = '10';

0: jdbc:hive2://hadoop108:10000> select * from stu6;
+----------+------------+-------------+--+
| stu6.id  | stu6.name  | stu6.month  |
+----------+------------+-------------+--+
| 1001     | zhangfei   | 10          |
| 1002     | liubei     | 10          |
| 1003     | guanyu     | 10          |
| 1004     | zhaoyun    | 10          |
| 1005     | caocao     | 10          |
| 1006     | zhouyu     | 10          |
| 1001     | zhangfei   | 11          |
| 1002     | liubei     | 11          |
| 1003     | guanyu     | 11          |
| 1004     | zhaoyun    | 11          |
| 1005     | caocao     | 11          |
| 1006     | zhouyu     | 11          |
| 1001     | zhangfei   | 12          |
| 1002     | liubei     | 12          |
| 1003     | guanyu     | 12          |
| 1004     | zhaoyun    | 12          |
| 1005     | caocao     | 12          |
| 1006     | zhouyu     | 12          |
+----------+------------+-------------+--+

建立表並載入資料(As Select):

create table if not exists stu7
as select id,name from stu1;

0: jdbc:hive2://hadoop108:10000> select * from stu7;
+----------+------------+--+
| stu7.id  | stu7.name  |
+----------+------------+--+
| 1001     | zhangfei   |
| 1002     | liubei     |
| 1003     | guanyu     |
| 1004     | zhaoyun    |
| 1005     | caocao     |
| 1006     | zhouyu     |
+----------+------------+--+
6 rows selected (0.149 seconds)

location:

1,HDFS的路徑上有如下的內容:/ex  該目錄下有stu.txt檔案
create external table stu_ex2(id int,name string)
row format delimited
fields terminated by '\t'
location '/ex';

0: jdbc:hive2://hadoop108:10000> select * from stu_ex2;
+-------------+---------------+--+
| stu_ex2.id  | stu_ex2.name  |
+-------------+---------------+--+
| 1001        | zhangfei      |
| 1002        | liubei        |
| 1003        | guanyu        |
| 1004        | zhaoyun       |
| 1005        | caocao        |
| 1006        | zhouyu        |
+-------------+---------------+--+
6 rows selected (0.093 seconds)

import:

import匯入的 資料必須是export匯出的資料:

1,將資料匯出到HDFS上:
export table stu1 to '/export/data/stu1'

0: jdbc:hive2://hadoop108:10000> !sh hadoop fs -ls /export/data/stu1
Found 2 items
-rwxr-xr-x   3 isea supergroup       1329 2018-12-01 19:38 /export/data/stu1/_metadata
drwxr-xr-x   - isea supergroup          0 2018-12-01 19:38 /export/data/stu1/data

發現stu1目錄下多了兩個檔案,資料儲存在data中

2,將HDFS上的資料匯入到stu8;
0: jdbc:hive2://hadoop108:10000> show tables;
+------------------------+--+
|        tab_name        |
+------------------------+--+
| stu1                   |
| stu2                   |
| stu3                   |
| stu4                   |
| stu5                   |
| stu6                   |
| stu7                   |
| stu_ex1                |
| stu_ex2                |
| stu_par1               |
| stu_par2               |
| values__tmp__table__1  |
+------------------------+--+
0: jdbc:hive2://hadoop108:10000> import table stu8 from '/export/data/stu1';

0: jdbc:hive2://hadoop108:10000> select * from stu8;
+----------+------------+--+
| stu8.id  | stu8.name  |
+----------+------------+--+
| 1001     | zhangfei   |
| 1002     | liubei     |
| 1003     | guanyu     |
| 1004     | zhaoyun    |
| 1005     | caocao     |
| 1006     | zhouyu     |
+----------+------------+--+

總結:

hive中一共有五種資料匯出的方式:

①:load data方式,如果路徑是local是追加,若為HDFS則為是覆蓋

②:insert into [ value() , select ]

③:as , like ;as會獲得資料,like只會獲得表結構

④:location:先網HDFS上上傳資料,在把該檔案所在的資料夾的路徑通過location的方式指定給表

⑤:import:匯入export出來的資料 import table table_name from 'hdfs路徑';