1. 程式人生 > >Hive中資料庫Database基本操作

Hive中資料庫Database基本操作

Database

Create Database

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]
  [LOCATION hdfs_path]//預設在倉庫根目錄
  [WITH DBPROPERTIES (property_name=property_value, ...)];

Drop Database

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];//資料庫中有表存在需要使用CASCADE

Alter Database

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14.0)
 
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
  
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

檢視

show databses ;
show databases like 'db_hive*';

使用

use db_hive;

檢視資料庫結構

desc database db_hive_03;

DDL

wiki

create table

1. CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [
   [ROW FORMAT row_format]  //每一行的分隔符、格式
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
eg:create table IF NOT EXISTS default.log_20150913( 
ip string COMMENT 'remote ip address', 
users string COMMENT 'users',
req_url string COMMENT 'user request url')
COMMENT 'beifeng web access logs'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' 
STORED AS TEXTFILE ;
//分表,提高分析速率
2. CREATE  TABLE [IF NOT EXISTS] [db_name.]table_name      [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 eg: create table IF NOT EXISTS default.log_20150913_sa AS 
 select ip,req_url from default.log_20150913;
3. CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
  eg: create table IF NOT EXISTS default.log_20150914 like default.log_20150913;
 

EXTERNAL:
在hive中表的型別:

  • 託管表(外部表)——EXTERNAL。
    • 刪除表時不會刪除在hdfs中的資料,只會刪除元資料
    • 一般自己指定目錄位置
  • 管理表——預設
    • 預設儲存在/user/hive/warehouse下,也可以自己指定
    • 刪除表時,會刪除表資料以及元資料
data_type
primitive_type
| array_type
| map_type
| struct_type
| union_type – (Note: Available in Hive 0.7.0 and later)
primitive_type
TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION – (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY – (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP – (Note: Available in Hive 0.8.0 and later)
| DECIMAL – (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) – (Note: Available in Hive 0.13.0 and later)
| DATE – (Note: Available in Hive 0.12.0 and later)
| VARCHAR – (Note: Available in Hive 0.12.0 and later)
| CHAR – (Note: Available in Hive 0.13.0 and later)
array_type
ARRAY < data_type >
map_type
MAP < primitive_type, data_type >
struct_type
STRUCT < col_name : data_type [COMMENT col_comment], …>
union_type
UNIONTYPE < data_type, data_type, … > – (Note: Available in Hive 0.7.0 and later)
row_format
DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] – (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, …)]
file_format:
SEQUENCEFILE
| TEXTFILE – (Default, depending on hive.default.fileformat configuration)
| RCFILE – (Note: Available in Hive 0.6.0 and later)
| ORC – (Note: Available in Hive 0.11.0 and later)
| PARQUET – (Note: Available in Hive 0.13.0 and later)
| AVRO – (Note: Available in Hive 0.14.0 and later)
| JSONFILE – (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
constraint_specification:
[, PRIMARY KEY (col_name, …) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, …) REFERENCES table_name(col_name, …) DISABLE NOVALIDATE

本地資料載入到表中

load data local inpath '/opt/datas/-log.txt’into table default.log_20150913;

Drop Table

DROP TABLE [IF EXISTS] table_name [PURGE];     -- (Note: PURGE available in Hive 0.14.0 and later)

Alter Table

Rename Table

ALTER TABLE table_name RENAME TO new_table_name;

Alter Table Properties

ALTER TABLE table_name SET TBLPROPERTIES table_properties;
 
table_properties:
  : (property_name = property_value, property_name = property_value, ... )

Alter Table Comment

ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);

Add SerDe Properties

ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
 
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES serde_properties;
 
serde_properties:
  : (property_name = property_value, property_name = property_value, ... )

分割槽表

分割槽表其實就是對應要給HDFS檔案系統上的獨立資料夾,該資料夾下時該分割槽所有的資料檔案。Hive中的分割槽就是分目錄,把一個大的資料集根據業務需求分割成更小的資料集
在查詢時,通過where子句中的表示式來選擇查詢所需要的指定的分割槽,這樣查詢的效率會提高很多。
eg:

create table IF NOT EXISTS default.dept_partition( 
deptno int, 
dname string,
loc string)
partitioned by (month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
//載入資料
load data local inpath '/opt/datas/dept.txt' overwrite into table default.dept_partition
partition (month='201509');

//查詢分割槽內容
select * from dept_partition where month='201509';

新增分割槽的兩種方式(如果沒有自動生成分割槽目錄):
1.

dfs -mkdir -p /user/hive/warehouse/dept_part/day=20150913;
dfs -put /opt/datas/dept.txt /user/hive/warehouse/dept_part/day=20150913;

msck repair table dept-part;//修復
dfs -mkdir -p /user/hive/warehouse/dept_part/day=20150914;
dfs -put /opt/datas/dept.txt /user/hive/warehouse/dept_part/day=20150914;

alter table dept-part add partition(day= '20150914');

Hive的資料型別

wiki

在這裡插入圖片描述
在這裡插入圖片描述