hive載入資料的幾種形式

阿新 • • 發佈：2019-01-27

hive的資料匯入

1 直接插入，效率低
insert into table XXX values(); 如果有分割槽的話就可以加上 partition(month='201809')

2 通過load方式載入資料
load data local inpath '/export/servers/hive-study-data/score.csv' overwrite into table score partition(month='201810'); //單分割槽情況
load data local inpath '/export/servers/hive-study-data/score.csv' overwrite into table score2 partition(year='2018',month='10',day='16'); //多分割槽情況

3 通過查詢方式載入資料
insert overwrite table score3 partition(year='2018',month='10',day='16') select s_id,c_id,s_score from score2;

4 多插入模式（將一張表拆分成多個部分的時候用）
將score4拆分成兩部分，首先建立兩張表
create table score_first(s_id string,c_id string) partitioned by(month string) row format delimited fields terminated by '\t';

create table score_second(c_id string,s_score int) partitioned by(month string) row format delimited fields terminated by '\t';

往兩張表插入資料，overwrite表示插入多條資料給表score_first
from score4 insert overwrite table score_first partition(month='201806') select s_id,c_id insert overwrite table score_second partition(month = '201806') select c_id,s_score;

5 查詢語句中建立表並載入資料（複製A表資料給B表）
create table score5 as select * from score;

6 建立表時通過location指定載入資料的路徑
建立表，並指定在hdfs上的位置
create table score7(s_id string,c_id string,s_score int) row format delimited fields terminated by '\t' location '/scoredatas';

在這個路徑下放一個數據
hdfs dfs -put score.csv /scoredatas/
重新查詢就可以得到資料
select * from score7;

7export匯出與import匯入的hive操作
export table techer to '/export/techer'; 從hive到處hdfs中
import table techer2 from '/export/techer';

hive的資料匯出：
1 insert overwrite local directory '/export/servers/daochu' select * from score10;
匯出是亂碼格式的

1 insert overwrite local directory '/export/servers/exporthive' row format delimited fields terminated by '\t' collection items terminated by '#' select * from student;
匯出是正常格式：

2 shell命令匯出
bin/hive -e "select * from myhive.score;" > /export/servers/exporthive/score.txt
正常碼

排序：
order by 全域性排序
sort by 區域性排序，每個reduce 排一個序號
distribute by 分割槽排序
設定reduce數量
set mapreduce.job.mapreduces=7;
通過distribute進行分割槽 sort進行排序，這樣就會在一個資料夾出現7個檔案，按照s_id分割槽
insert overwrite local directory '/export/servers/hivedatas/sort' select * from score distribute by s_id sort by s_score;

當distribute by 和sort by 欄位相同時，可以採用cluster by 方式，這種方式排序只能倒敘排序
以下兩種寫法等價
select * from score cluster by s_id;
select * from score distribute by s_id sort by s_id;

hive載入資料的幾種形式

hive載入資料的幾種形式

JSON與Javabean轉換的幾種形式

Nim函數調用的幾種形式

express中獲取參數的幾種形式

3.1.2 選擇結構的幾種形式

c語言結構體定義的幾種形式

Python編程系列---Python中裝飾器的幾種形式及萬能裝飾器

Selenium 的頁面載入以及幾種等待的問題

React Component 存在的幾種形式

selenium中關於停止載入的幾種方法

結構體定義的幾種形式

Linux開機自動載入的幾種方法

hive載入資料

路由元件按需載入的幾種方法

GUID轉換字串的幾種形式

Spark技術體系與MapReduce，Hive，Storm幾種技術的關係與區別

高併發快取處理之——快取穿透的幾種形式及解決方案

golangWeb框架---github.com/gin-gonic/gin學習一(路由、多級路由、引數傳遞幾種形式)

python變數賦值的幾種形式細節

js實現延遲載入的幾種方法

hive載入資料的幾種形式

相關推薦