1. 程式人生 > >Hadoop Hive與Hbase關係 整合

Hadoop Hive與Hbase關係 整合

用hbase做資料庫,但由於hbase沒有類sql查詢方式,所以操作和計算資料非常不方便,於是整合hive,讓hive支撐在hbase資料庫層面 的 hql查詢.hive也即 做資料倉庫

1. 基於Hadoop+Hive架構對海量資料進行查詢:http://blog.csdn.net/kunshan_shenbin/article/details/7105319
2. HBase 0.90.5 + Hadoop 1.0.0 整合:http://blog.csdn.net/kunshan_shenbin/article/details/7209990
本文的目的是要講述如何讓Hbase和Hive能互相訪問,讓Hadoop/Hbase/Hive協同工作,合為一體。 

本文測試步驟主要參考自:http://running.iteye.com/blog/898399 

<!--  
<property>  
  <name>hive.exec.scratchdir</name>   
  <value>/usr/local/hive/tmp</value>   

</property>   
-->  
  
<property>   
  <name>hive.querylog.location</name>   
  <value>/usr/local/hive/logs</value>   
</property>   
  
<property>  
  <name>hive.aux.jars.path</name>   
  <value>file:///usr/local/hive/lib/hive-hbase-handler-0.8.0.jar,file:///usr/local/hive/lib/hbase-0.90.5.jar,file:///usr/local/hive/lib/zookeeper-3.3.2.jar</value>  
</property>  


當然,這邊博文也是按照官網的步驟來的:http://wiki.apache.org/hadoop/Hive/HBaseIntegration 
1. 拷貝hbase-0.90.5.jar和zookeeper-3.3.2.jar到hive/lib下。 
    注意:如何hive/lib下已經存在這兩個檔案的其他版本(例如zookeeper-3.3.1.jar),建議刪除後使用hbase下的相關版本。 

2. 修改hive/conf下hive-site.xml檔案,在底部新增如下內容:

注意:如果hive-site.xml不存在則自行建立,或者把hive-default.xml.template檔案改名後使用。 
具體請參見:http://blog.csdn.net/kunshan_shenbin/article/details/7210020 

3. 拷貝hbase-0.90.5.jar到所有hadoop節點(包括master)的hadoop/lib下。 
4. 拷貝hbase/conf下的hbase-site.xml檔案到所有hadoop節點(包括master)的hadoop/conf下。 
注意,hbase-site.xml檔案配置資訊參照:http://blog.csdn.net/kunshan_shenbin/article/details/7209990
注意,如果3,4兩步跳過的話,執行hive時很可能出現如下錯誤:

[html] view plaincopy
org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.   
This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and   
then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. at org.apache.hadoop.  
hbase.zookeeper.ZooKeeperWatcher. 
參考:http://blog.sina.com.cn/s/blog_410d18710100vlbq.html 

現在可以嘗試啟動Hive了。 
單節點啟動:

> bin/hive -hiveconf hbase.master=master:60000

叢集啟動:

> bin/hive -hiveconf hbase.zookeeper.quorum=slave

如何hive-site.xml檔案中沒有配置hive.aux.jars.path,則可以按照如下方式啟動。

> bin/hive --auxpath /usr/local/hive/lib/hive-hbase-handler-0.8.0.jar, /usr/local/hive/lib/hbase-0.90.5.jar, /usr/local/hive/lib/zookeeper-3.3.2.jar -hiveconf hbase.zookeeper.quorum=slave

接下來可以做一些測試了。 
1.建立hbase識別的資料庫: 
CREATE TABLE hbase_table_1(key int, value string)  
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")  
TBLPROPERTIES ("hbase.table.name" = "xyz");  
hbase.table.name 定義在hbase的table名稱 
hbase.columns.mapping 定義在hbase的列族 
2.使用sql匯入資料 
a) 新建hive的資料表 
   hive> CREATE TABLE pokes (foo INT, bar STRING);  
b) 批量插入資料 
[sql] view plaincopy

hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE 

pokes;  
c) 使用sql匯入hbase_table_1 
hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=86;  
3. 檢視資料 
hive> select * from  hbase_table_1;  
這時可以登入Hbase去檢視資料了. 
> /usr/local/hbase/bin/hbase shell 
hbase(main):001:0> describe 'xyz'   
hbase(main):002:0> scan 'xyz'   
hbase(main):003:0> put 'xyz','100','cf1:val','www.360buy.com' 
這時在Hive中可以看到剛才在Hbase中插入的資料了。 
hive> select * from hbase_table_1 
4. hive訪問已經存在的hbase 
使用CREATE EXTERNAL TABLE 
[sql] view plaincopy 
CREATE EXTERNAL TABLE hbase_table_2(key int, value string)  
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val")  
TBLPROPERTIES("hbase.table.name" = "some_existing_table");  


多列和多列族(Multiple Columns and Families) 
1.建立資料庫 
Java程式碼  
CREATE TABLE hbase_table_2(key int, value1 string, value2 int, value3 int)   
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES (  
"hbase.columns.mapping" = ":key,a:b,a:c,d:e"  
);  

2.插入資料 
Java程式碼  
INSERT OVERWRITE TABLE hbase_table_2 SELECT foo, bar, foo+1, foo+2   
FROM pokes WHERE foo=98 OR foo=100;  


這個有3個hive的列(value1和value2,value3),2個hbase的列族(a,d) 
Hive的2列(value1和value2)對應1個hbase的列族(a,在hbase的列名稱b,c),hive的另外1列(value3)對應列(e)位於列族(d)

3.登入hbase檢視結構 
Java程式碼 

hbase(main):003:0> describe "hbase_table_2"  
DESCRIPTION                                                             ENABLED                                 
 {NAME => 'hbase_table_2', FAMILIES => [{NAME => 'a', COMPRESSION => 'N true                                    
 ONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_M                                         
 EMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'd', COMPRESSION =>                                          
 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN                                         
 _MEMORY => 'false', BLOCKCACHE => 'true'}]}                                                                    
1 row(s) in 1.0630 seconds 
4.檢視hbase的資料 
Java程式碼 

hbase(main):004:0> scan 'hbase_table_2'  
ROW                          COLUMN+CELL                                                                        
 100                         column=a:b, timestamp=1297695262015, value=val_100                                 
 100                         column=a:c, timestamp=1297695262015, value=101                                     
 100                         column=d:e, timestamp=1297695262015, value=102                                     
 98                          column=a:b, timestamp=1297695242675, value=val_98                                  
 98                          column=a:c, timestamp=1297695242675, value=99                                      
 98                          column=d:e, timestamp=1297695242675, value=100                                     
2 row(s) in 0.0380 seconds 

5.在hive中檢視 
Java程式碼 

hive> select * from hbase_table_2;  
OK  
100     val_100 101     102  
98      val_98  99      100  
Time taken: 3.238 seconds  
參考資料: 
http://running.iteye.com/blog/898399 
http://heipark.iteye.com/blog/1150648 
http://www.javabloger.com/article/apache-hadoop-hive-hbase-integration.html

轉載來源: http://blog.csdn.net/liuzhenwen/article/details/28078625