1. 程式人生 > >apache-hive-1.2.1和hbase-1.2.2的整合(偽分散式)

apache-hive-1.2.1和hbase-1.2.2的整合(偽分散式)

我的機器環境: hadoop2.6.0 的偽分散式  Hbase偽分散式環境 參考:hbase權威指南P240 1.啟動hadoop和hbase  2.下載apache-hive-1.2.1 3.修改hive中conf下的hive-env.sh # Set HADOOP_HOME to point to a specific hadoop install directory HADOOP_HOME=/home/hadoop/hadoop HBASE_HOME=/home/hadoop/hbase-1.2.2 # Hive Configuration Directory can be controlled by: # export HIVE_CONF_DIR= export HIVE_CLASSPATH=/home/hadoop/hbase-1.2.2/conf # Folder containing extra ibraries required for hive compilation/execution can be controlled by: export HIVE_AUX_JARS_PATH=/home/hadoop/hbase-1.2.2/lib 4.啟動hive 備註:給通過hive給hbase建表時,如果出現下面的錯誤,需重新編譯hive-hbase-handler-1.2.1.jar,替換hive/lib下的原jar包 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V 操作記錄: hadoop@ubuntu:~/apache-hive-1.2.1-bin/bin$ ./hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.2.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.2.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/home/hadoop/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties hive> create table pokes(foo int,bar string); OK Time taken: 3.432 seconds hive> load data local inpath '/home/hadoop/apache-hive-1.2.1-bin/examples/files/kv1.txt' overwrite into table pokes; Loading data to table default.pokes Table default.pokes stats: [numFiles=1, numRows=0, totalSize=5812, rawDataSize=0] OK Time taken: 1.353 seconds hive> select * from pokes; OK 238    val_238 86    val_86 311    val_311 27    val_27 165    val_165 409    val_409 Time taken: 1.143 seconds, Fetched: 500 row(s) hive> create table hbase_table_1(key int,value string)     > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'     > with serdeproperties("hbase.columns.mapping"=":key,cf1:val")     > tblproperties("hbase.table.name"="hbase_hive_t1"); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V 針對這個錯誤,網上說這是不相容造成的,網路上提供了兩種解決方案: 1.換更高版本的hive 例如2.xx 可經試驗發現問題依舊沒有解決 2.重新編譯hive-hbase-handler-1.2.1.jar,替換hive/lib中的同名包(此方法可行) hive> create table hbase_table_1(key int,value string)     >     stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'     >      with serdeproperties("hbase.columns.mapping"=":key,cf1:val")     >      tblproperties("hbase.table.name"="hbase_hive_t1"); OK Time taken: 4.788 seconds hive>     > ; hive> insert overwrite table hbase_table_1 select * from pokes; Query ID = hadoop_20170117004636_520fee8b-9d6c-4b41-88a5-a58402e0b6af Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1484619043631_0001, Tracking URL = http://ubuntu:8088/proxy/application_1484619043631_0001/ Kill Command = /home/hadoop/hadoop/bin/hadoop job  -kill job_1484619043631_0001 Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0 2017-01-17 00:47:53,388 Stage-0 map = 0%,  reduce = 0% 2017-01-17 00:48:21,381 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 6.54 sec MapReduce Total cumulative CPU time: 6 seconds 540 msec Ended Job = job_1484619043631_0001 MapReduce Jobs Launched: Stage-Stage-0: Map: 1   Cumulative CPU: 7.34 sec   HDFS Read: 15889 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 7 seconds 340 msec OK Time taken: 108.485 seconds hive> select count(*) from pokes; Query ID = hadoop_20170117004939_099ed588-fbb4-4b9a-ac1c-1fb6259e7d11 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes):   set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers:   set hive.exec.reducers.max=<number> In order to set a constant number of reducers:   set mapreduce.job.reduces=<number> Starting Job = job_1484619043631_0002, Tracking URL = http://ubuntu:8088/proxy/application_1484619043631_0002/ Kill Command = /home/hadoop/hadoop/bin/hadoop job  -kill job_1484619043631_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2017-01-17 00:50:10,356 Stage-1 map = 0%,  reduce = 0% 2017-01-17 00:50:30,514 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.94 sec 2017-01-17 00:50:49,055 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.38 sec MapReduce Total cumulative CPU time: 6 seconds 380 msec Ended Job = job_1484619043631_0002 MapReduce Jobs Launched: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.38 sec   HDFS Read: 12409 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 380 msec OK 500 Time taken: 72.3 seconds, Fetched: 1 row(s) hive> select count(*) from hbase_table_1; Query ID = hadoop_20170117005103_2fa584c7-0c2f-4b40-bc86-093f01e35a00 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes):   set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers:   set hive.exec.reducers.max=<number> In order to set a constant number of reducers:   set mapreduce.job.reduces=<number> Starting Job = job_1484619043631_0003, Tracking URL = http://ubuntu:8088/proxy/application_1484619043631_0003/ Kill Command = /home/hadoop/hadoop/bin/hadoop job  -kill job_1484619043631_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2017-01-17 00:51:53,774 Stage-1 map = 0%,  reduce = 0% 2017-01-17 00:52:16,564 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.42 sec 2017-01-17 00:52:36,997 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.93 sec MapReduce Total cumulative CPU time: 9 seconds 930 msec Ended Job = job_1484619043631_0003 MapReduce Jobs Launched: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 9.93 sec   HDFS Read: 13551 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 9 seconds 930 msec OK 309 Time taken: 95.345 seconds, Fetched: 1 row(s) hive> drop table pokes; OK Time taken: 3.374 seconds hive> select * from pokes; FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'pokes' hive> drop table hbase_table_1; OK Time taken: 4.64 seconds