Hadoop Hive與Hbase關係 整合
用hbase做資料庫,但由於hbase沒有類sql查詢方式,所以操作和計算資料非常不方便,於是整合hive,讓hive支撐在hbase資料庫層面 的 hql查詢.hive也即 做資料倉庫
1. 基於Hadoop+Hive架構對海量資料進行查詢:http://blog.csdn.net/kunshan_shenbin/article/details/71053192. HBase 0.90.5 + Hadoop 1.0.0 整合:http://blog.csdn.net/kunshan_shenbin/article/details/7209990
本文的目的是要講述如何讓Hbase和Hive能互相訪問,讓Hadoop/Hbase/Hive協同工作,合為一體。
本文測試步驟主要參考自:http://running.iteye.com/blog/898399
<!-- <property> <name>hive.exec.scratchdir</name> <value>/usr/local/hive/tmp</value> </property> --> <property> <name>hive.querylog.location</name> <value>/usr/local/hive/logs</value> </property> <property> <name>hive.aux.jars.path</name> <value>file:///usr/local/hive/lib/hive-hbase-handler-0.8.0.jar,file:///usr/local/hive/lib/hbase-0.90.5.jar,file:///usr/local/hive/lib/zookeeper-3.3.2.jar</value> </property>
當然,這邊博文也是按照官網的步驟來的:http://wiki.apache.org/hadoop/Hive/HBaseIntegration
1. 拷貝hbase-0.90.5.jar和zookeeper-3.3.2.jar到hive/lib下。
注意:如何hive/lib下已經存在這兩個檔案的其他版本(例如zookeeper-3.3.1.jar),建議刪除後使用hbase下的相關版本。
2. 修改hive/conf下hive-site.xml檔案,在底部新增如下內容:
注意:如果hive-site.xml不存在則自行建立,或者把hive-default.xml.template檔案改名後使用。
具體請參見:http://blog.csdn.net/kunshan_shenbin/article/details/7210020
3. 拷貝hbase-0.90.5.jar到所有hadoop節點(包括master)的hadoop/lib下。
4. 拷貝hbase/conf下的hbase-site.xml檔案到所有hadoop節點(包括master)的hadoop/conf下。
注意,hbase-site.xml檔案配置資訊參照:http://blog.csdn.net/kunshan_shenbin/article/details/7209990
注意,如果3,4兩步跳過的話,執行hive時很可能出現如下錯誤:
[html] view plaincopy
org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and
then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. at org.apache.hadoop.
hbase.zookeeper.ZooKeeperWatcher.
參考:http://blog.sina.com.cn/s/blog_410d18710100vlbq.html
現在可以嘗試啟動Hive了。
單節點啟動:
> bin/hive -hiveconf hbase.master=master: 60000 |
叢集啟動:
> bin/hive -hiveconf hbase.zookeeper.quorum=slave
如何hive-site.xml檔案中沒有配置hive.aux.jars.path,則可以按照如下方式啟動。
> bin/hive --auxpath /usr/local/hive/lib/hive-hbase-handler-0.8.0.jar, /usr/local/hive/lib/hbase-0.90.5.jar, /usr/local/hive/lib/zookeeper-3.3.2.jar -hiveconf hbase.zookeeper.quorum=slave
接下來可以做一些測試了。
1.建立hbase識別的資料庫:
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
hbase.table.name 定義在hbase的table名稱
hbase.columns.mapping 定義在hbase的列族
2.使用sql匯入資料
a) 新建hive的資料表
hive> CREATE TABLE pokes (foo INT, bar STRING);
b) 批量插入資料
[sql] view plaincopy
hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE
pokes;
c) 使用sql匯入hbase_table_1
hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=86;
3. 檢視資料
hive> select * from hbase_table_1;
這時可以登入Hbase去檢視資料了.
> /usr/local/hbase/bin/hbase shell
hbase(main):001:0> describe 'xyz'
hbase(main):002:0> scan 'xyz'
hbase(main):003:0> put 'xyz','100','cf1:val','www.360buy.com'
這時在Hive中可以看到剛才在Hbase中插入的資料了。
hive> select * from hbase_table_1
4. hive訪問已經存在的hbase
使用CREATE EXTERNAL TABLE
[sql] view plaincopy
CREATE EXTERNAL TABLE hbase_table_2(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val")
TBLPROPERTIES("hbase.table.name" = "some_existing_table");
多列和多列族(Multiple Columns and Families)
1.建立資料庫
Java程式碼
CREATE TABLE hbase_table_2(key int, value1 string, value2 int, value3 int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,a:b,a:c,d:e"
);
2.插入資料
Java程式碼
INSERT OVERWRITE TABLE hbase_table_2 SELECT foo, bar, foo+1, foo+2
FROM pokes WHERE foo=98 OR foo=100;
這個有3個hive的列(value1和value2,value3),2個hbase的列族(a,d)
Hive的2列(value1和value2)對應1個hbase的列族(a,在hbase的列名稱b,c),hive的另外1列(value3)對應列(e)位於列族(d)
3.登入hbase檢視結構
Java程式碼
hbase(main):003:0> describe "hbase_table_2"
DESCRIPTION ENABLED
{NAME => 'hbase_table_2', FAMILIES => [{NAME => 'a', COMPRESSION => 'N true
ONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_M
EMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'd', COMPRESSION =>
'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN
_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 1.0630 seconds
4.檢視hbase的資料
Java程式碼
hbase(main):004:0> scan 'hbase_table_2'
ROW COLUMN+CELL
100 column=a:b, timestamp=1297695262015, value=val_100
100 column=a:c, timestamp=1297695262015, value=101
100 column=d:e, timestamp=1297695262015, value=102
98 column=a:b, timestamp=1297695242675, value=val_98
98 column=a:c, timestamp=1297695242675, value=99
98 column=d:e, timestamp=1297695242675, value=100
2 row(s) in 0.0380 seconds
5.在hive中檢視
Java程式碼
hive> select * from hbase_table_2;
OK
100 val_100 101 102
98 val_98 99 100
Time taken: 3.238 seconds
參考資料:
http://running.iteye.com/blog/898399
http://heipark.iteye.com/blog/1150648
http://www.javabloger.com/article/apache-hadoop-hive-hbase-integration.html
轉載來源: http://blog.csdn.net/liuzhenwen/article/details/28078625