hadoop2.7.2叢集hive-1.2.1整合hbase-1.2.1

阿新 • • 發佈：2019-02-08

本文操作基於官方文件說明,以及其他相關資料,若有錯誤,希望大家指正

根據hive官方說明整合hbase連結如下https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

文中指出hive0.9.0匹配的Hbase版本至少要0.92,比這更早的hive版本要匹配Hbase0.89或者0.90

自Hive1.x後,hive能夠相容Hbase0.98.x或者更低版本,而Hive2.x與HBase2.x或更高版本相容.

然後看到關鍵部分

The storage handler is built as an independent module, hive-hbase-handler-xyzjar , which must be available on the Hive client auxpath, along with HBase, Guava and ZooKeeper jars.

hive-hbase-handler-xxx.jar作為一個獨立的模組,這個jar一定要用Hive client auxpath使用,Hive client auxpath後面要接Hbase,Cuava,Zookeeper的jar包,(請忽視本人拙劣翻譯)下面跟上幾個簡單的用法一個是Cli單節點,一個是有zookeeper管理Hbase的叢集用法

看重點的(Note that the jar locations and names have changed in Hive 0.9.0, so for earlier releases, some changes are needed.)

The handler requires Hadoop 0.20 or higher, and has only been tested with dependency versions hadoop-0.20.x, hbase-0.92.0 and zookeeper-3.3.4.If you are not using hbase-0.92.0, you will need to rebuild the handler with the HBase jar matching your version, and change the --auxpath above accordingly.Failure to use matching versions will lead to misleading connection failures such as MasterNotRunningException since the HBase RPC protocol changes often.

該Handler程式需要Hadoop 0.20或者更高的版本,並且只在hadoop-0.20.x,hbase-0.92.0和zookeeper2.2.4上測試過,如果你不使用hbase0.92.0版本,你需要重建handler,使用匹配你使用的Hive版本的Hbase的jar,並且相應的更改--auxpath版本,

這裡本人使用的是hadoop2.7.2,hive1.2.1,hbase1.2.1所以,要想整合必須重新編譯handler.

下面進入正題....

1.hive_hbase-handler.jar在hive-1.2.1中,首先下載官網hive-1.2.1原始碼src:

http://www.apache.org/dyn/closer.cgi/hive/選擇

apache-hive-1.2.1-src.tar.gz點選下載

2. eclipse中建立編譯工程,名稱隨便,普通java project

我這裡以hive-hbase為名

3. 將hive原始碼中的hbase-hadler部分匯入到編譯專案中

選擇src右擊import-->General-->FileSytem,下一步

找到你下載解壓的hive原始碼目錄,找到hbase-hadler目錄比如我的在/opt/src/hive-1.2.1-src

目錄為hbase-handler/src/java,有java基礎的都不會弄錯的.確認後保證包名以org開頭

4 .然後開始給eclipse專案下,建立一個lib目錄,加入相關的jar包,確保順利通過編譯,根據hive的版本,匯入的jar包也會有所差異,直到你的專案沒有小紅叉,就算完成了這裡我分一下幾個步驟來新增:

這裡為了方便,我依次將hive,hbase,hadoop中的lib下的主要的jar包或者所有的jar包分別複製一份到桌面,以便向專案中新增使用同時不破壞叢集的lib.

關於如何新增lib下的jar包看你自己,這裡有兩種方法,一個是根據報錯資訊逐個新增jar包,這個需要你對hive,hadoop以及hbase的api很清除才可以.還有一個方法是把所有的jar包一股腦全新增上,適合初級學者,不影響最終結果.

5 先來說一下第一種方法:首先把hive下的所有jar包,和hadoop的common包,mapreduce包,以及hbase/lib下所有jar包新增到專案的lib下,同時刪除重複包名,版本不同的jar只保留一個,然後右鍵專案選擇build path-->Config build path,在對話方塊中選擇Libraies,然後Add JARs,選擇這個專案下的lib目錄,全選jar包,確定,應用,OK

第二種方法:編譯這個handler需要的jar其實只有一下這些,分別在hive,hbase,hadoop的lib下找全下列jar包,新增到專案的lib目錄下就可以了

6 . 編譯打包

選擇專案src目錄,右擊Export-->Java-->JAR file-->Next,選擇專案下的src,並設定匯出路徑,名稱可以直接寫作hive-hbase-handler-1.2.1.jar其他預設,Finish後即可

然後把匯出的hive-hbase-handler-1.2.1.jar包放入hive安裝路徑的lib下,覆蓋原來的handler.

同時eclilpse編譯hadler的專案下對應的lib目錄中的必須的jar包也放入到hive/lib下如下,並刪除多版本的jar包(這裡只有zookeeper重複

[email protected]:src$ cd /home/hadoop/workspace/hive-hbase/lib/
[email protected]:lib$ ls
commons-io-2.4.jar                      hbase-server-1.2.1.jar
commons-logging-1.1.3.jar               hive-common-1.2.1.jar
hadoop-common-2.7.2.jar                 hive-exec-1.2.1.jar
hadoop-mapreduce-client-core-2.7.2.jar  hive-metastore-1.2.1.jar
hbase-client-1.2.1.jar                  jsr305-3.0.0.jar
hbase-common-1.2.1.jar                  metrics-core-2.2.0.jar
hbase-protocol-1.2.1.jar                zookeeper-3.4.8.jar
[email protected]:lib$ cp ./* /opt/modules/hive-1.2.1/lib/
[email protected]:conf$cd /opt/modules/hive-1.2.1/conf/
[email protected]:conf$ ls /opt/modules/hive-1.2.1/lib/zookeeper-3.4.*
/opt/modules/hive-1.2.1/lib/zookeeper-3.4.6.jar
/opt/modules/hive-1.2.1/lib/zookeeper-3.4.8.jar
[email protected]:conf$ rm -f /opt/modules/hive-1.2.1/lib/zookeeper-3.4.6.jar

這裡的融合部分完成了

在官方文件中使用的是在hive後面跟隨引數設定,這裡為了簡化使用,我們講這些引數設定到hive的環境和配置檔案當中

7. 更改hive中的環境變數以及新增配置

hive-env.sh

[email protected]:conf$ pwd
/opt/modules/hive-1.2.1/conf
[email protected]:conf$ ls
beeline-log4j.properties.template    hive-log4j.properties.template
hive-env.sh.template                 hive-site.xml
hive-exec-log4j.properties.template  ivysettings.xml
[email protected]:conf$ cp hive-env.sh.template hive-env.sh
[email protected]:conf$ vim hive-env.sh
##新增一下內容
 export HADOOP_HOME=/opt/modules/hadoop-2.7.2
export HIVE_CONF_DIR=/opt/modules/hive-1.2.1/conf
export JAVA_HOME=/usr/local/java/jdk1.7.0_80

hive-site.xml在之前的基礎上新增以下內容

[email protected]:conf$ vim hive-site.xml 
    <property>    
        <name>hive.aux.jars.path</name>     
        <value>file:///opt/modules/hive-1.2.1/lib/hive-hbase-handler-1.2.1.jar,file:///opt/modules/hive-1.2.1/lib/guava-14.0.1.jar,file:///opt/modules/hive-1.2.1/lib/hbase-common-1.2.1.jar,file:///opt/modules/hive-1.2.1/lib/zookeeper-3.4.8.jar</value>    
    </property>    
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>hadoop:2181,hadoop1:2182,hadoop2:2183</value>
    </property>

如此開始進入測試

啟動叢集並檢查啟動情況

[email protected]:conf$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/modules/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[email protected]:conf$ ssh hadoop1
Last login: Thu May 12 14:05:18 2016 from hadoop
[[email protected] ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/modules/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] ~]$ ssh hadoop2
Last login: Thu May 12 14:05:26 2016 from hadoop1
[[email protected] ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/modules/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] ~]$ jps
1728 Jps
1699 QuorumPeerMain
[[email protected] ~]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/modules/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower
[[email protected] ~]$ exit
logout
Connection to hadoop2 closed.
[[email protected] ~]$ exit
logout
Connection to hadoop1 closed.
[email protected]:conf$ start-dfs.sh 
Starting namenodes on [hadoop]
hadoop: starting namenode, logging to /opt/modules/hadoop-2.7.2/logs/hadoop-hadoop-namenode-hadoop.out
hadoop1: starting datanode, logging to /opt/modules/hadoop-2.7.2/logs/hadoop-hadoop-datanode-hadoop1.out
hadoop2: starting datanode, logging to /opt/modules/hadoop-2.7.2/logs/hadoop-hadoop-datanode-hadoop2.out
Starting secondary namenodes [hadoop]
hadoop: starting secondarynamenode, logging to /opt/modules/hadoop-2.7.2/logs/hadoop-hadoop-secondarynamenode-hadoop.out
[email protected]:conf$ hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 32977600512 (30.71 GB)
Present Capacity: 25174839296 (23.45 GB)
DFS Remaining: 25174265856 (23.45 GB)
DFS Used: 573440 (560 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 1

-------------------------------------------------
Live datanodes (2):

Name: 192.168.2.11:50010 (hadoop2)
Hostname: hadoop2
Decommission Status : Normal
Configured Capacity: 16488800256 (15.36 GB)
DFS Used: 290816 (284 KB)
Non DFS Used: 3901227008 (3.63 GB)
DFS Remaining: 12587282432 (11.72 GB)
DFS Used%: 0.00%
DFS Remaining%: 76.34%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 12 18:10:50 CST 2016


Name: 192.168.2.10:50010 (hadoop1)
Hostname: hadoop1
Decommission Status : Normal
Configured Capacity: 16488800256 (15.36 GB)
DFS Used: 282624 (276 KB)
Non DFS Used: 3901534208 (3.63 GB)
DFS Remaining: 12586983424 (11.72 GB)
DFS Used%: 0.00%
DFS Remaining%: 76.34%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 12 18:10:50 CST 2016
[email protected]:conf$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /opt/modules/hadoop-2.7.2/logs/yarn-hadoop-resourcemanager-hadoop.out
hadoop2: starting nodemanager, logging to /opt/modules/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-hadoop2.out
hadoop1: starting nodemanager, logging to /opt/modules/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-hadoop1.out
[email protected]:conf$ jps
7769 SecondaryNameNode
7328 QuorumPeerMain
7531 NameNode
8002 ResourceManager
8269 Jps
[email protected]:conf$ start-hbase.sh 
starting master, logging to /opt/modules/hbase-1.2.1/logs/hbase-hadoop-master-hadoop.out
hadoop1: starting regionserver, logging to /opt/modules/hbase-1.2.1/bin/../logs/hbase-hadoop-regionserver-hadoop1.out
hadoop2: starting regionserver, logging to /opt/modules/hbase-1.2.1/bin/../logs/hbase-hadoop-regionserver-hadoop2.out
[email protected]:conf$ jps
7769 SecondaryNameNode
8551 Jps
8428 HMaster
7328 QuorumPeerMain
7531 NameNode
8002 ResourceManager

這裡發現我的hfds安全模式激活了,可能是由於上電腦非分正常關機導致的,過一會兒複製副本夠數就自動關閉了.或者手動關閉,沒多大事

啟動進入hive,由於我的hive元資料使用的是mysql儲存,先啟動mysql服務,然後建立hbase識別的表

[email protected]:conf$ sudo service mysqld start
Starting MySQL
.. * 
[email protected]:conf$ hive

Logging initialized using configuration in jar:file:/opt/modules/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> create table hbase_table_1(key int,value string)
    > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > with serdeproperties("hbase.columns.mapping"=":key,cf1:val")
    > tblproperties("hbase.table.name"="xyz");
OK
Time taken: 2.001 seconds
hive>

在開一個終端,建立一個hive使用的資料檔案

[email protected]:~$ cat test.data 
1	zhangsan
2	lisi
3	wangwu

在hive中建立符合此結構的表,並load此檔案的資料,檢查一遍,這樣算是完成了初步的準備工作

hive> create table test1(id int,name string)
    > row format delimited
    > fields terminated by '\t'
    > stored as textfile;
OK
Time taken: 0.214 seconds
hive> load data local inpath '/home/hadoop/test.data' into table test1;
Loading data to table default.test1
Table default.test1 stats: [numFiles=1, totalSize=27]
OK
Time taken: 0.714 seconds
hive> select * from hbase_table_1;
OK

測試資料儲存:

講hive中的表資料匯入到hbase_table_1中,查看錶內容

hive> insert overwrite table hbase_table_1 select * from test1;
Query ID = hadoop_20160512194326_ec2c3ec0-0fdc-4265-8478-668ab5df4b5c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1463047967051_0001, Tracking URL = http://hadoop:8088/proxy/application_1463047967051_0001/
Kill Command = /opt/modules/hadoop-2.7.2/bin/hadoop job  -kill job_1463047967051_0001
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2016-05-12 19:43:46,127 Stage-0 map = 0%,  reduce = 0%
2016-05-12 19:43:55,629 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 2.26 sec
MapReduce Total cumulative CPU time: 2 seconds 260 msec
Ended Job = job_1463047967051_0001
MapReduce Jobs Launched: 
Stage-Stage-0: Map: 1   Cumulative CPU: 2.26 sec   HDFS Read: 3410 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 260 msec
OK
Time taken: 29.829 seconds
hive> select * from hbase_table_1;
OK
1	zhangsan
2	lisi
3	wangwu
Time taken: 0.178 seconds, Fetched: 3 row(s)

如此之後,另開一個終端,開啟hbases後list查看錶是否存在xyz,並scan內容

[email protected]:~$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/modules/hbase-1.2.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/modules/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016

hbase(main):001:0> list
TABLE                                                                                 
scores                                                                                
xyz                                                                                   
2 row(s) in 0.1950 seconds

=> ["scores", "xyz"]
hbase(main):002:0> scan 'xyz'
ROW                    COLUMN+CELL                                                    
0 row(s) in 0.1220 seconds

hbase(main):003:0> scan 'xyz'
ROW                                                  COLUMN+CELL                                                                                                                                               
 1                                                   column=cf1:val, timestamp=1463053413954, value=zhangsan                                                                                                   
 2                                                   column=cf1:val, timestamp=1463053413954, value=lisi                                                                                                       
 3                                                   column=cf1:val, timestamp=1463053413954, value=wangwu                                                                                                     
3 row(s) in 0.0550 seconds

hbase(main):004:0>

如此以來,已經hive和hbase的整合完成

hadoop2.7.2叢集hive-1.2.1整合hbase-1.2.1

本文操作基於官方文件說明,以及其他相關資料,若有錯誤,希望大家指正根據hive官方說明整合hbase連結如下https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration 文中指出hive0.9.0匹配的

【Hadoop】Hadoop2.7.3安裝Hive 2.1.0

第一步：下載最新的hive，直接去apache 裡面找hive2.1.0下載就行。第二步，解壓到伺服器 tar zxvf apache-hive-2.0.0-bin.tar.gz mv apache-hive-2.0.0-bin /hom

hadoop學習1--hadoop2.7.3叢集環境搭建

下面的部署步驟，除非說明是在哪個伺服器上操作，否則預設為在所有伺服器上都要操作。為了方便，使用root使用者。 1.準備工作 1.1 centOS7伺服器3臺 master 192.168.174.132 node1

解決Spring Boot(2.1.3.RELEASE)整合spring-data-elasticsearch3.1.5.RELEASE報NoNodeAvailableException[None of the configured nodes are available

停止 pro sts repos failed lap loopback ould earch Spring Boot(2.1.3.RELEASE)整合spring-data-elasticsearch3.1.5.RELEASE報NoNodeAvailableExcepti

hadoop2.7.2叢集hive-1.2.1整合hbase-1.2.1

hadoop2.7.2叢集hive-1.2.1整合hbase-1.2.1

【Hadoop】Hadoop2.7.3安裝Hive 2.1.0

hadoop學習1--hadoop2.7.3叢集環境搭建

解決Spring Boot(2.1.3.RELEASE)整合spring-data-elasticsearch3.1.5.RELEASE報NoNodeAvailableException[None of the configured nodes are available

Ubuntu + Hadoop2.7.3叢集搭建

Hadoop2.7.4叢集搭建

Hadoop2.7.6叢集搭建

centos7 搭建ha(高可用)hadoop2.7.3叢集

Hadoop2.7.0叢集的NameNode在HA下如何切換active和standby狀態

spring4.1+hibernate4.3整合學習問題記錄1

CentOS7+Hadoop2.7.2(HA高可用+Federation聯邦)+Hive1.2.1+Spark2.1.0 完全分散式叢集安裝

Hadoop2.7.1+Hbase1.2.1叢集環境搭建(1)hadoop2.7.1原始碼編譯

Hive之 hive-1.2.1 + hadoop 2.7.4 叢集安裝

大資料學習環境搭建(CentOS6.9+Hadoop2.7.3+Hive1.2.1+Hbase1.3.1+Spark2.1.1)

Hadoop2.7.2+Hbase1.2.1分散式環境搭建整理

hadoop2.7.3完全分散式安裝-docker-hive1.2.1-hiveserver2-weave1.9.3

cool-2018-10-22-centos7-hive-1.2叢集+整合hbase-1.1.3叢集

Hadoop2.7.3+HBase1.2.5+ZooKeeper3.4.6搭建分散式叢集環境

Hadoop2.7.2安裝與叢集搭建

使用Kubeadm搭建Kubernetes(1.12.2)叢集

hadoop2.7.2叢集hive-1.2.1整合hbase-1.2.1

相關推薦