測試Hbase 表對映成 Hive表查詢效率

阿新 • • 發佈：2019-01-30

一、準備工作：

1、編寫程式將1000萬條資料寫到Hbase表中；
2、將對應的Hbase表對映成Hive表。
在Hive 的shell中執行類似如下的命令

hive> CREATE EXTERNAL TABLE 
IF NOT EXISTS t_hbase_person_his10(id string, NAME String, salary string,START_DATE string,END_DATE string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES ('hbase.columns.mapping' 
 = ':key,info:id,info:name,info:salary,info:start_date,info:end_date') 
 TBLPROPERTIES ('hbase.table.name' ='t_hbase_person_his10');

3、複製一份一樣的資料到Hive表，這份Hive資料是實際存在Hive中的。通過類似Sql複製

create table t_person_his10 as select * from t_hbase_person_his10;

二、通過Hive Jdbc 方式對比其查詢時間，對比結果如下所示

其中t_hbase_person_his10 表為Hbase關聯生成的Hive表，t_person_history10為實際的Hive表，來源通過create table t_person_his10 as select * from t_hbase_person_his10;

1、查當前資料，預設返回30條

sql = "select * from t_hbase_person_his10 where end_date='9999-12-31' limit 30";// use statTime:353ms 

sql = "select * from t_person_history10 where end_date='9999-12-31' limit 30";//use statTime:119ms

2、查指定日期資料，預設返回30條

sql = "select * from t_hbase_person_his10 where start_date<='2017-09-18' 
 and end_date>='2017-09-18' and salary>990000 limit 30";//use statTime:411ms

sql = "select * from t_hbase_person_his10 where start_date<='2017-09-20' and end_date>='2017-09-20' and salary>990000 limit 30";//use statTime:908ms

sql = "select * from t_person_history10 where start_date<='2017-09-18' and end_date>='2017-09-18' and salary>990000 limit 30";//use statTime:147ms
sql = "select * from t_person_history10 where start_date<='2017-09-20' and end_date>='2017-09-20' and salary>990000 limit 30";//use statTime:266ms

3、order by效率很低

sql = "select * from t_hbase_person_his10 where start_date<='2017-09-20' and end_date>='2017-09-20' and salary>990000  order by salary limit 30";// use statTime:95000ms 

sql = "select * from t_person_history10 where start_date<='2017-09-20' and end_date>='2017-09-20' and salary>990000 order by salary limit 30";//use statTime:35836ms

4、between and

sql = "select * from t_hbase_person_his10 where end_date='9999-12-31' and salary between 500000 and 600000 limit 30";//use statTime:338ms

sql = "select * from t_person_history10 where end_date='9999-12-31' and salary between 500000 and 600000 limit 30";//use statTime:166ms

5、對指定使用者進行溯源，這裡以使用者名稱為唯一標識，效率極低，（可以考慮用rowkey做為唯一標識）

sql = "SELECT mobile,start_date FROM t_hbase_person_his10 where name='hehe98'";//use statTime:86701ms 13901173602,2017-09-04 13201382515,2017-09-07 15107963040,2017-09-11
sql = "SELECT mobile,start_date FROM t_hbase_person_his10 where rowkey='1298'";//use statTime:316ms 

sql = "SELECT mobile,start_date FROM t_person_history10 where name='hehe98'";//use statTime:6326ms 13901173602,2017-09-04 13201382515,2017-09-07 15107963040,2017-09-11
sql = "SELECT mobile,start_date FROM t_person_history10 where rowkey='1298'";//use statTime:6288ms

6、group by

sql = "select start_date,count(1) from t_hbase_person_his10 group by start_date";//use statTime:100330ms

sql = "select start_date,count(1) from t_person_history10 group by start_date";//use statTime:25857ms

7、模糊查詢

sql = "select * from t_hbase_person_his10 where name like '%hehe111%' limit 30";// use statTime:2738ms
sql = "select * from t_hbase_person_his10 where name like '%hehe111%'  and start_date>'2017-09-18' limit 10";// use statTime:2745ms
sql = "select * from t_hbase_person_his10 where rowkey like '%10059%'  and start_date>'2017-09-18' limit 10";// use statTime:665ms

sql = "select * from t_person_history10 where name like '%hehe111%'  and start_date>'2017-09-18' limit 10";// use statTime:257ms
sql = "select * from t_person_history10 where rowkey like '%10059%'  and start_date>'2017-09-18' limit 10";// use statTime:135ms
sql = "select * from t_person_history10 where name like '%hehe111%' limit 30";// use statTime:225ms

8、查詢指定rowkey

sql = "select * from t_hbase_person_his10 where rowkey='11123'";//use statTime:342ms

sql = "select * from t_person_history10 where rowkey='11123'";//use statTime:8386ms

9、對Hive表進行關聯查詢

sql = "select th.mobile,th.start_date,tb.mobile from t_person_history10 th, t_hbase_person_his10 tb where th.name=tb.name limit 10";//use statTime:88614ms

sql = "select th.mobile,th.start_date,tb.mobile from t_person_history10 th left outer join t_hbase_person_his10 tb on  th.name=tb.name limit 10";//use statTime:88614ms

綜合上述結果：在將Hbase表對映成Hive表查詢效率會降低不少。但如果資料量只有1000萬級，普通查詢影響並不大。比如關聯查詢與聚合排查等效率就非常低了，個人建議對於大資料量的表還是不要關聯成Hive表來查詢，因為這樣對應的Hive表分割槽等原先的功能用不了了。

測試Hbase 表對映成 Hive表查詢效率

一、準備工作：

二、通過Hive Jdbc 方式對比其查詢時間，對比結果如下所示

測試Hbase 表對映成 Hive表查詢效率

Hbase中的列式表對映到hive的外表

Hibernate 繼承 - 根類對映成一個表

Hibernate 繼承 - 每個實現類對映成一個表

在cm安裝的大數據管理平臺中集成impala之後讀取hive表中的數據的設置（hue當中執行impala的數據查詢）

通過hive表整合查詢hbase資料

hive表信息查詢：查看表結構、表操作等--轉

Java鏈接HBASE數據庫，創建一個表，刪除一張表，修改表，輸出插入，修改，數據刪除，數據獲取，顯示表信息，過濾查詢，分頁查詢，地理hash

Spark訪問與HBase關聯的Hive表

【hive】hive表很大的時候查詢報錯問題

hive 表的創建的操作與測試

Hive表種map字段的查詢取用

spark sql 查詢hive表並寫入到PG中

hive表信息查詢：查看表結構、表操作

把kafka資料從hbase遷移到hdfs，並按天載入到hive表(hbase與hadoop為不同叢集)

MyBatis的學習總結三——輸入對映和輸出對映以及多表關聯查詢

在cm安裝的大資料管理平臺中整合impala之後讀取hive表中的資料的設定（hue當中執行impala的資料查詢）

hive 表關聯hbase表命令和總結

Phoenix（4）：phoenix中建立hbase的對映表

培訓系列12--spark dataframe 註冊成hive 的臨時表

測試Hbase 表對映成 Hive表查詢效率

一、準備工作：

二、通過Hive Jdbc 方式對比其查詢時間，對比結果如下所示

相關推薦