Hive中資料庫Database基本操作
阿新 • • 發佈:2018-12-16
Database
Create Database
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]//預設在倉庫根目錄
[WITH DBPROPERTIES (property_name=property_value, ...)];
Drop Database
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];//資料庫中有表存在需要使用CASCADE
Alter Database
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0) ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0) ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)
檢視
show databses ;
show databases like 'db_hive*';
使用
use db_hive;
檢視資料庫結構
desc database db_hive_03;
DDL
create table
1. CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name data_type [COMMENT col_comment], ... [constraint_specification])] [COMMENT table_comment] [ [ROW FORMAT row_format] //每一行的分隔符、格式 [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later) eg:create table IF NOT EXISTS default.log_20150913( ip string COMMENT 'remote ip address', users string COMMENT 'users', req_url string COMMENT 'user request url') COMMENT 'beifeng web access logs' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE ; //分表,提高分析速率 2. CREATE TABLE [IF NOT EXISTS] [db_name.]table_name [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables) eg: create table IF NOT EXISTS default.log_20150913_sa AS select ip,req_url from default.log_20150913; 3. CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]; eg: create table IF NOT EXISTS default.log_20150914 like default.log_20150913;
EXTERNAL:
在hive中表的型別:
- 託管表(外部表)——EXTERNAL。
- 刪除表時不會刪除在hdfs中的資料,只會刪除元資料
- 一般自己指定目錄位置
- 管理表——預設
- 預設儲存在/user/hive/warehouse下,也可以自己指定
- 刪除表時,會刪除表資料以及元資料
- data_type
- primitive_type
| array_type
| map_type
| struct_type
| union_type – (Note: Available in Hive 0.7.0 and later) - primitive_type
- TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION – (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY – (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP – (Note: Available in Hive 0.8.0 and later)
| DECIMAL – (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) – (Note: Available in Hive 0.13.0 and later)
| DATE – (Note: Available in Hive 0.12.0 and later)
| VARCHAR – (Note: Available in Hive 0.12.0 and later)
| CHAR – (Note: Available in Hive 0.13.0 and later) - array_type
- ARRAY < data_type >
- map_type
- MAP < primitive_type, data_type >
- struct_type
- STRUCT < col_name : data_type [COMMENT col_comment], …>
- union_type
- UNIONTYPE < data_type, data_type, … > – (Note: Available in Hive 0.7.0 and later)
- row_format
- DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] – (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, …)] - file_format:
- SEQUENCEFILE
| TEXTFILE – (Default, depending on hive.default.fileformat configuration)
| RCFILE – (Note: Available in Hive 0.6.0 and later)
| ORC – (Note: Available in Hive 0.11.0 and later)
| PARQUET – (Note: Available in Hive 0.13.0 and later)
| AVRO – (Note: Available in Hive 0.14.0 and later)
| JSONFILE – (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname - constraint_specification:
- [, PRIMARY KEY (col_name, …) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, …) REFERENCES table_name(col_name, …) DISABLE NOVALIDATE
本地資料載入到表中
load data local inpath '/opt/datas/-log.txt’into table default.log_20150913;
Drop Table
DROP TABLE [IF EXISTS] table_name [PURGE]; -- (Note: PURGE available in Hive 0.14.0 and later)
Alter Table
Rename Table
ALTER TABLE table_name RENAME TO new_table_name;
Alter Table Properties
ALTER TABLE table_name SET TBLPROPERTIES table_properties;
table_properties:
: (property_name = property_value, property_name = property_value, ... )
Alter Table Comment
ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);
Add SerDe Properties
ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
serde_properties:
: (property_name = property_value, property_name = property_value, ... )
分割槽表
分割槽表其實就是對應要給HDFS檔案系統上的獨立資料夾,該資料夾下時該分割槽所有的資料檔案。Hive中的分割槽就是分目錄,把一個大的資料集根據業務需求分割成更小的資料集
在查詢時,通過where子句中的表示式來選擇查詢所需要的指定的分割槽,這樣查詢的效率會提高很多。
eg:
create table IF NOT EXISTS default.dept_partition(
deptno int,
dname string,
loc string)
partitioned by (month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
//載入資料
load data local inpath '/opt/datas/dept.txt' overwrite into table default.dept_partition
partition (month='201509');
//查詢分割槽內容
select * from dept_partition where month='201509';
新增分割槽的兩種方式(如果沒有自動生成分割槽目錄):
1.
dfs -mkdir -p /user/hive/warehouse/dept_part/day=20150913;
dfs -put /opt/datas/dept.txt /user/hive/warehouse/dept_part/day=20150913;
msck repair table dept-part;//修復
dfs -mkdir -p /user/hive/warehouse/dept_part/day=20150914;
dfs -put /opt/datas/dept.txt /user/hive/warehouse/dept_part/day=20150914;
alter table dept-part add partition(day= '20150914');
Hive的資料型別