1. 程式人生 > >Hive與HBase整合(例項)

Hive與HBase整合(例項)

 

例項1

1.先在Hbase中建立表(三列族):

create 'ceshi7',
{NAME=>'TIME',VERSIONS=>1,BLOCKCACHE=>true,BLOOMFILTER=>'ROW',COMPRESSION=>'SNAPPY',
DATA_BLOCK_ENCODING => 'PREFIX_TREE', BLOCKSIZE => '65536'},
{SPLITS => ['1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']},
{NAME=>'ORDERITEM',VERSIONS=>1,BLOCKCACHE=>true,BLOOMFILTER=>'ROW',COMPRESSION=>'SNAPPY',
DATA_BLOCK_ENCODING => 'PREFIX_TREE', BLOCKSIZE => '65536'},
{SPLITS => ['1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']},
{NAME=>'ORDERTYPE',VERSIONS=>1,BLOCKCACHE=>true,BLOOMFILTER=>'ROW',COMPRESSION=>'SNAPPY',
DATA_BLOCK_ENCODING => 'PREFIX_TREE', BLOCKSIZE => '65536'},
{SPLITS => ['1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']}

2.之後在Hive中建立外部表:

create external table lqioc_ioc_mid.ceshi7(
rowid string,
ordertypeno string,
ordertypename string,
ordertypecost string,
yearid string,
monthid string,
orderitemname string,
orderitemnum string)STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES("hbase.columns.mapping" 
=":key,ORDERTYPE:ORDERTYPENO,ORDERTYPE:ORDERTYPENAME,ORDERTYPE:ORDERTYPECOST,
TIME:YEARID,TIME:MONTHID,
ORDERITEM:ORDERITEMNAME,ORDERITEM:ORDERITEMNUM")
TBLPROPERTIES("hbase.table.name" = "ceshi7","hbase.mapred.output.outputtable" = "ceshi7");

3.報錯資訊:

Error: Error while processing statement: 
FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:MetaException(message:
Cannot CREATE EXTERNAL TABLE when 
hive.server2.enable.doAs is set to false.)

hive.server2.enable.doAs=false,則yarn作業獲取到的hiveserver2使用者都為hive使用者

hive.server2.enable.doAs=true,則為實際的使用者名稱

例項2

1.直接在Hive中建內部表:

create table lqioc_ioc_mid.ceshi6(
rowid string,
ordertypeno string,
ordertypename string,
ordertypecost string,
yearid string,
monthid string,
orderitemname string,
orderitemnum string)STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES("hbase.columns.mapping" 
=":key,ORDERTYPE:ORDERTYPENO,ORDERTYPE:ORDERTYPENAME,ORDERTYPE:ORDERTYPECOST,
TIME:YEARID,TIME:MONTHID,
ORDERITEM:ORDERITEMNAME,ORDERITEM:ORDERITEMNUM")
TBLPROPERTIES("hbase.table.name" = "ceshi6","hbase.mapred.output.outputtable" = "ceshi6");

2.hive中執行插入語句:

insert into table lqioc_ioc_mid.z_area_monitor_pasq
select 
concat(FROM_UNIXTIME(UNIX_TIMESTAMP()),'_',AREA_NAME,'_',ORDER_TYPE_NO,
'_',ORDER_TYPE,'_',YEAR_ID,'_',MONTH_ID,'_',ORDER_ITEM),
ORDER_TYPE_NO,
ORDER_TYPE,
ORDER_TYPE_COST,
YEAR_ID,
MONTH_ID,
ORDER_ITEM,
ORDER_ITEM_NUM from lqioc_ioc_dw.pasq_modify;

3.檢測執行情況

select * from lqioc_ioc_dw.pasq_modify;
select * from lqioc_ioc_mid.ceshi6;
scan 'ceshi6'
get 'ceshi6','Z641201812龍洲新城'

4.先刪除Hbase表,再去查詢Hive對映表,會報錯(*^▽^*)

disable 'ceshi7'
drop 'ceshi7'
select * from lqioc_io_mid.ceshi7;
Error: java.io.IOException: org.apache.hadoop.hbase.TableNotFoundException: ceshi (state=,code=0)

 5.之後在Hive檢視有這張表,但是刪除Hive中的表會報錯,show之後查不到表:

drop table lqioc_ioc_mid.ceshi6;
use lqioc_ioc_mid;
show tables;
Error: Error while processing statement: FAILED: 
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:org.apache.hadoop.hbase.TableNotFoundException: ceshi7

6.根據rowkey查詢歷史資料

scan 'z_area_monitor_pasq', {STARTROW=>'2019-01-04 15:31:52', STOPROW=>'2019-01-04 15:36:52'}
scan 'z_area_monitor_pasq', {STARTROW=>'2019-01-04 15:31:52', ENDROW=>'2019-01-04 15:36:52'}

7.修改資料版本,獲取多版本資料

alter 'z_area_monitor_pasq',{NAME=>'ROWKEY1',VERSIONS=>3}
get 'z_area_monitor_pasq','rowkey1',{COLUMN=>'rowkey1:name',VERSIONS=>3}

 

引申:RowKey查詢(Scan和Get比較)

一、HBase查詢方式
    hbase的查詢實現只提供兩種方式:

按指定rowkey獲取唯一一條記錄:get方法。
按指定條件獲取一批記錄:scan方法。
    實現條件查詢功能使用的就是scan方式,scan在使用時有以下幾點值的注意:

scan可以通過setCachingsetBatch方法提高速度(以空間換時間)
scan可以通過setStartRowsetEndRow來限定範圍。範圍越小,效能越高。
scan可以通過setFilter方法新增過濾器,這也是分頁(效能差)、多條件查詢的基礎。
二、RowFilter介紹

operator description

less

 小於

less_or_equal

 小於等於
equal  等於
not_equal  不等於
greater_or_equal  大於等於
greater  大於
no_op  排除所有
Comparator description
BinaryComparator 使用bytes.comparaTo()比較
BinaryPrefixComparator 和BinaryComparator差不多,從前面開始比較
NullComparator  
BitComparator  
RegexStringComparator 正則表示式
subStringComparator 把數字當成字串,用contains()來判斷
參考:
Get:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setFilter-org.apache.hadoop.hbase.filter.Filter-
Scan:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setFilter-org.apache.hadoop.hbase.filter.Filter-
RowKey使用數字比字串好:https://blog.csdn.net/alphags/article/details/53786777 
掃描引數設定: http://grokbase.com/t/hbase/user/126vtkfr7h/scan-vs-put-vs-get
RowKey查詢(Scan和Get比較):https://blog.csdn.net/high2011/article/details/80205000
RowKey設計:http://blog.chedushi.com/archives/9720
對比:http://student-lp.iteye.com/blog/2309841
Hbase中多版本(version)資料獲取辦法:https://blog.csdn.net/shujuelin/article/details/83657485
Hbase常用操作整理:https://blog.csdn.net/sinat_36121406/article/details/82768846
Hadoop HBase概念學習系列之優秀行鍵設計(十六):https://www.cnblogs.com/zlslch/p/6140487.html
HBase的RowKey設計:https://www.cnblogs.com/zlzhoulei/p/5594773.html