一文搞定hive之insert into 和 insert overwrite與資料分割槽

阿新 • • 發佈：2019-01-31

資料分割槽

資料庫分割槽的主要目的是為了在特定的SQL操作中減少資料讀寫的總量以縮減響應時間，主要包括兩種分割槽形式：水平分割槽與垂直分割槽。水平分割槽是對錶進行行分割槽。而垂直分割槽是對列進行分割槽，一般是通過對錶的垂直劃分來減少目標表的寬度，常用的是水平分割槽。

hive建立分割槽語法：

create external table if not exists tablename(

a string,

b string)

partitioned by (year string,month string)

row format delimited fields terminated by ',';

hive通常有三種方式對包含分割槽欄位的表進行資料插入：

1）靜態插入資料：要求插入資料時指定與建表時相同的分割槽欄位，如：

insert overwrite tablename （year='2017', month='03'） select a, b from tablename2;

2）動靜混合分割槽插入：要求指定部分分割槽欄位的值，如：

insert overwrite tablename （year='2017', month） select a, b from tablename2;

3）動態分割槽插入：只指定分割槽欄位，不用指定值，如：

insert overwrite tablename （year, month） select a, b from tablename2;

hive動態分割槽設定相關引數：

Hive.exec.dynamic.partition 是否啟動動態分割槽。false(不開啟) true（開啟）預設是 false

hive.exec.dynamic.partition.mode 開啟動態分割槽後，動態分割槽的模式，有 strict和 nonstrict 兩個值可選，strict 要求至少包含一個靜態分割槽列，nonstrict則無此要求。各自的好處，大家自己檢視哈。

hive.exec.max.dynamic.partitions 允許的最大的動態分割槽的個數。可以手動增加分割槽。預設1000

hive.exec.max.dynamic.partitions.pernode 一個 mapreduce job所允許的最大的動態分割槽的個數。預設是100

資料插入之insert into 和 insert overwrite

hive是基於Hadoop的一個數據倉庫工具，可以將結構化的資料檔案對映為一張資料庫表，並提供簡單的sql查詢功能，可以將sql語句轉換為MapReduce任務進行執行。通常hive包括以下四種資料匯入方式：

（1）、從本地檔案系統中匯入資料到Hive表；

（2）、從HDFS上匯入資料到Hive表；

（3）、在建立表的時候通過從別的表中查詢出相應的記錄並插入到所建立的表中；

（4）、從別的表中查詢出相應的資料並匯入到Hive表中。

INSERT INTO

使用樣例

insert into table tablename1 select a, b, c from tablename2;

INSERT OVERWRITE

使用樣例

insert overwrite table tablename1 select a, b, c from tablename2;

兩者的異同

insert into 與 insert overwrite 都可以向hive表中插入資料，但是insert into直接追加到表中資料的尾部，而insert overwrite會重寫資料，既先進行刪除，再寫入。如果存在分割槽的情況，insert overwrite會只重寫當前分割槽資料。

一文搞定hive之insert into 和 insert overwrite與資料分割槽

資料分割槽

資料插入之insert into 和 insert overwrite

INSERT INTO

INSERT OVERWRITE

兩者的異同

一文搞定hive之insert into 和 insert overwrite與資料分割槽

（轉載）一文搞定資料倉庫之拉鍊表，流水錶，全量表，增量表

一文搞定連結串列面試題系列之二 —— Leetcode234. Palindrome Linked List迴文連結串列\

專案實戰之跨域處理~一文搞定所有跨域需求

一文搞定 Git 相關概念和常用指令

一文搞定python的日誌自定義

一文搞定python的時間處理

一文搞定 Mybatis 的應用

一文搞定信用評分卡模型-Python、SAS和R的實現（含程式碼和視訊）

一週搞定系列之模電筆記

一文搞定FastDFS分散式檔案系統配置與部署

一文搞定 Mac OS X下使用iTerm2+zsh+oh-my-zsh配置Powerline風格的命令列

一文搞定並發面試題

一文搞定 SonarQube 接入 C#(.NET) 程式碼質量分析

精華：三次握手+四次揮手，一文搞定所有！

一文搞定Spring Boot + Vue 專案在Linux Mysql環境的部署（強烈建議收藏）

SpringCloud第二代實戰系列：一文搞定Nacos實現服務註冊與發現

一文搞懂 Elasticsearch 之 Mapping

一文搞定 Spring Data JPA

一文搞定陣列扁平化（超全面的陣列拉平方案及實現）

一文搞定hive之insert into 和 insert overwrite與資料分割槽

資料分割槽

資料插入之insert into 和 insert overwrite

INSERT INTO

INSERT OVERWRITE

兩者的異同

相關推薦