DDL資料定義(二)表的分割槽

阿新 • • 發佈：2018-12-02

分割槽表

原理:

分割槽表實際上就是對應一個HDFS檔案系統上的獨立的資料夾,該資料夾下是該分割槽所有的資料檔案,Hive中的分割槽就是分目錄,把一個大的資料集根據業務需要分割成小的資料集,在查詢時,通過WHERE子句中表達式選擇查詢需要的指定分割槽,這樣查詢的效率高很多

分割槽表的基本操作

引入分割槽表(根據日期對日誌進行管理)

/user/hive/warehouse/log_partition/20170702/20170702.log
/user/hive/warehouse/log_partition/20170703/20170703.log
/user/hive/warehouse/log_partition/20170704/20170704.log

建立分割槽表的語法

 hive (default)> create table dept_partition(
                deptno int, dname string, loc string
                )
                partitioned by (month string)
                row format delimited fields terminated by '\t';

載入資料到分割槽表

  hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201709');
  hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201708');
  hive (hive)> load data local inpath '/opt/datas/dept.txt' into table dept_partition partition(month='201707');

查詢分割槽表中的資料
單分割槽查詢

 select * from dept_partition where month='201709';

多分割槽聯合查詢(使用union聯合多個select語句)

	select * from dept_partition where month='201709' union select * from dept_partition where month='201708';

增加分割槽
建立單個分割槽

 hive (hive)> alter table dept_partition add partition(month='201705');

同時建立多個分割槽

 hive (hive)> alter table dept_partition add partition(month='201704') partition(month='201703');

刪除分割槽
刪除單個分割槽

 hive (hive)> alter table dept_partition drop partition(month='201705');

刪除多個分割槽(注意分割槽間存在,號)

	hive (hive)> alter table dept_partition drop partition(month='201704') ,partition(month='201703');

檢視分割槽表有多少分割槽

 hive (hive)> show partitions dept_partition;
 OK
 partition
 month=201706
 month=201707
 month=201708
 month=201709

查詢分割槽表的結構

hive (hive)> desc formatted dept_partition;

動態分割槽

1.開啟動態分割槽

set hive.exec.dynamic.partition=true;

2.設定動態分割槽模式

set hive.exec.dynamic.partition.mode=nostrict;

預設是strict,表示必須指定至少一個分割槽為靜態分割槽
nostrict模式允許所有的分割槽欄位都可以使用動態分割槽

資料來源

1,zshang,18,game-girl-book,stu_addr:beijing-work_addr:shanghai,2018-08-08
2,lishi,16,shop-boy-book,stu_addr:hunan-work_addr:shanghai,2018-08-09
3,wang2mazi,20,fangniu-eat,stu_addr:shanghai-work_addr:tianjing,2018-08-10
4,zshang,18,game-girl-book,stu_addr:beijing-work_addr:shanghai,2018-08-08
5,lishi,16,shop-boy-book,stu_addr:hunan-work_addr:shanghai,2018-08-09
6,wang2mazi,20,fangniu-eat,stu_addr:shanghai-work_addr:tianjing,2018-08-10

4.建立表

create table person1(
id int,
name string,
age int,
likes array<string>,
address map<string,string>,
dt string
)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';

5.插入資料到表中

load data local inpath '/test/person.txt' into table person1;

6.建立分割槽表

create table datap(
id int,
name string,
age int,
likes array<string>,
address map<string,string>
)
partitioned by (dt string)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';

7.插入資料

insert into datap partition(dt) select id,name,age,likes,address,dt from person1 distribute by dt;

8.檢視分割槽資訊
在這裡插入圖片描述

二級分割槽表

建立二級分割槽表(只是partitioned by加了欄位)

create table dept_partition2(
deptno int,
dname string,
loc string
)
partitioned by (month string,day string)
row format delimited fields terminated by '\t';

正常的載入資料

load data local inpath '/opt/datas/dept.txt' 
into table dept_partition2 partition(month='201709',day='13');

查詢分割槽資料

hive (hive)> select * from dept_partition2;

把資料直接上傳到分割槽目錄上，讓分割槽表和資料產生關聯的三種方式

方式一：上傳資料後修復
上傳資料

hive (hive)> dfs -mkdir -p /user/hive/warehouse/hive.db/dept_partition2/month=201709/day=12;
hive (hive)> dfs -put /opt/datas/dept.txt /user/hive/warehouse/hive.db/dept_partition2/month=201709/day=12;

查詢資料（老版本的hive，查詢不到剛上傳的資料）

hive (hive)>  select * from dept_partition2 where month='201709' and day='12';
OK
dept_partition2.deptno	dept_partition2.dname	dept_partition2.loc	dept_partition2.montdept_partition2.day
Time taken: 2.766 seconds

執行修復命令

hive (hive)> msck repair table dept_partition2;

再次查詢

hive (hive)>  select * from dept_partition2 where month='201709' and day='12';

方式二：上傳資料後新增分割槽
上傳資料

hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=11;
hive (default)> dfs -put /opt/module/datas/dept.txt  /user/hive/warehouse/dept_partition2/month=201709/day=11;

執行新增分割槽

hive (default)> alter table dept_partition2 add partition(month='201709', day='11');

查詢資料

hive (default)> select * from dept_partition2 where month='201709' and day='11';

方式三：上傳資料後load資料到分割槽

建立目錄

hive (default)> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=10;

上傳資料

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10');

查詢資料

hive (default)> select * from dept_partition2 where month='201709' and day='10';

DDL資料定義(二)表的分割槽

分割槽表原理: 分割槽表實際上就是對應一個HDFS檔案系統上的獨立的資料夾,該資料夾下是該分割槽所有的資料檔案,Hive中的分割槽就是分目錄,把一個大的資料集根據業務需要分割成小的資料集,在查詢時,通過WHERE子句中表達式選擇查詢需要的指定分割槽,這樣查詢的效率高很多

Hive的DDL資料定義(二)表的修改和刪除

重命名錶語法 ALTER TABLE table_name RENAME TO new_table_name 案例 hive (hive)> alter table dept_partition2 rename to dept_partition3;

Hive的DDL資料定義(一)資料庫操作以及建立表

建立資料庫建立一個數據庫，資料庫在HDFS上的預設儲存路徑是/user/hive/warehouse/*.db。 hive (default)> create database db_hive; 避免要建立的資料庫已經存在錯誤，增加if not exists判斷。（

MySQL基礎系列之 DDL 資料定義語句大全

連線資料庫 mysql -u [使用者名稱] -h[host] -P[埠號] -p[密碼] mysql -u root -h【127.0.0.1】 -P3306 -p123456 檢視資料庫SHOW DATABASES databases 使用或切換資料庫 USE [資料庫名]

Hive 官方手冊翻譯 -- Hive DDL(資料定義語言)

Hive DDL(資料定義語言) Confluence Administrator建立, Janaki Lahorani修改於 2018年9月19日原文連結翻譯：Google Google翻譯，金山軟體金山詞霸校對：南大通用範振勇 (2018.9.26) 一、概述這裡是HiveQL

Hive的DDL資料定義和DML資料操作

Hive資料型別 Java資料型別 Hive資料型別長度 byte TINYINT 1byte short SMALINT 2byte int INT 4byte long BIGINT 8byte float FLOAT 單精度浮點數 double DOUBLE 雙精度浮點數 stri

Hive(5)-DDL資料定義

一. 建立資料庫 CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value

Hive-DDL資料定義

1、建立資料庫 1）建立一個數據庫，資料庫在 HDFS 上的預設儲存路徑是/user/hive/warehouse/*.db。 2）建立一個數據庫，指定資料庫在 HDFS 上存放的位置。 2、修改資料庫使用者可以使用 ALTER DATABASE 命令為某個資料庫的 D

Hive之DDL資料定義

1 建立資料庫 1）建立一個數據庫，資料庫在HDFS上的預設儲存路徑是/user/hive/warehouse/*.db。 hive (default)> create database db_hive; 2）避免要建立的資料庫已經存在錯誤，增加if not exists判斷。（標

三、Oracle學習筆記：DDL資料定義語句

一、DDL語言的學習 1.create關鍵字的，用來建表結構。（1）語句格式： create table tname( colname datatype, colName dataType, ***** colName data

MySQL基礎之DDL(資料定義)語句

本文主要介紹MySQL的DDL(資料定義語言)sql的分類:DDL: 資料定義語言操作物件:資料庫和表關鍵詞:create alter dropDML: 資料操作語言關鍵詞:insert delete

大資料（二十）：hive分割槽表、修改表語句與資料的匯入匯出

一、分割槽表分割槽表實際上就是對應一個HDFS檔案系統上的一個獨立的資料夾，該資料夾下是該分割槽所有的資料檔案，hive中的分割槽就是分目錄，把一個大的資料集更具業務需求分割成小的資料集。在查詢時通過where子句中的

mysql資料定義語言(DDL)庫和表(建立庫、建立表)

mysql資料定義語言(DDL)庫和表(建立庫、建立表) 我就直接po截圖和程式碼了,程式碼中有註釋 # SHOW VARIABLES WHERE variable_name = 'datadir'; SHOW VARIABLES WHERE variable_name LIKE

oracle資料庫資料定義語言DDL

1.使用create建立表 1)表中欄位常用的資料型別：1：vachar2（可變長度的字串）、char（定長的字串）、nvachar2（unicode字符集的可變長度的字串）、nchar（unicode的定長的字串）、long（變長的字串）2：數字型：number（p,s）最大精度38位的十進位

ORA-14060: 不能更改表分割槽列的資料型別或長度

在對分割槽表進行改造的時候，出現ORA-14060錯誤。這個原因主要是對分割槽鍵的欄位型別進行了修改（分割槽表中對其他欄位型別的修改沒有問題）。之前的做法是使用線上重定義，把分割槽表改造成非分割槽表，修改欄位型別，再次線上重定義進行分割槽。參考MOS 330964.1 ，可以使用alter

重學資料結構(二)雙鏈表

這次是系列二,雙鏈表,具體的說明就不太詳細說了,註釋都加到Code中去了,很詳細,特此記錄~ 還是先簡單說一下雙鏈表: 以下是維基百科中對雙鏈表的定義: 雙向連結串列，又稱為雙鏈表，是連結串列的一種，它的每個資料結點中都有兩個指標，分別指向直

獲取表單資料（二）

<%@ page contentType="text/html;charset=gb2312"%> <%! public String codeToString(String str) {//處理中文字串的函式 String s=str;

大資料（二十二）：hive分桶及抽樣查詢、自定義函式、壓縮與儲存

一、分桶及抽樣查詢 1.分桶表資料儲存分割槽針對的是資料儲存路徑（HDFS中表現出來的便是資料夾），分桶針對的是資料檔案。分割槽提供一個隔離資料和優化查詢的便利方式。不過，並非所有的資料集都可形成合理的分割槽，特別是當資料要

大資料（二十三）：hive優化、表優化

一、Fetch抓取 Fetch抓取是指，Hive中對某些情況的查詢可以不必使用MapReduce計算。例如，select * from employees;在這種情況下，Hive可以簡單讀取employee對應的儲存目錄

資料結構(二)--- 線性錶鏈表(單鏈表)java實現方式

線性表的鏈式儲存結構：所有元素不考慮相鄰位置，哪有空位置就到那裡，而只是讓每個元素知道它下一個元素的位置在那裡，這樣，我們可以在第一個元素時，就知道第二個元素的位置(記憶體地址)，而找到它；在第二個元素時，再找到第三個元素的位置，這樣所有的元素就可以通過遍歷而找到。連結串列中的第

DDL資料定義(二)表的分割槽

分割槽表

動態分割槽

二級分割槽表

把資料直接上傳到分割槽目錄上，讓分割槽表和資料產生關聯的三種方式

相關推薦