kettle 5.1.0 連線 Hadoop hive 2 (hive 1.2.1)
阿新 • • 發佈:2019-01-06
1. 配置HiveServer2,在hive-site.xml中新增如下的屬性
<property>
<name>hive.server2.thrift.bind.host</name>
<value>192.168.56.101</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
<description>Minimum number of Thrift worker threads</description>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>500</value>
<description>Maximum number of Thrift worker threads</description>
</property>
2. 啟動HiveServer2
$HIVE_HOME/bin/hiveserver2
3. 修改kettle的配置檔案
%KETTLE_HOME%/plugins/pentaho-big-data-plugin/plugin.properties
修改成下面的值
active.hadoop.configuration=hdp20
4. 啟動kettle,配置資料庫連線,如圖1所示
5. 測試
(1)在hive中建立測試表和資料
CREATE DATABASE test;
USE test;
CREATE TABLE a(a int,b int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/home/grid/a.txt' INTO TABLE a;
SELECT * FROM a;
查詢結果如圖2所示
(3)點選預覽,顯示的資料如圖4所示
https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2
http://stackoverflow.com/questions/25625088/pentaho-data-integration-with-hive-connection
http://blog.csdn.net/victor_ww/article/details/40041589
<property>
<name>hive.server2.thrift.bind.host</name>
<value>192.168.56.101</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
<description>Minimum number of Thrift worker threads</description>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>500</value>
<description>Maximum number of Thrift worker threads</description>
</property>
2. 啟動HiveServer2
$HIVE_HOME/bin/hiveserver2
3. 修改kettle的配置檔案
%KETTLE_HOME%/plugins/pentaho-big-data-plugin/plugin.properties
修改成下面的值
active.hadoop.configuration=hdp20
4. 啟動kettle,配置資料庫連線,如圖1所示
圖1
5. 測試
(1)在hive中建立測試表和資料
CREATE DATABASE test;
USE test;
CREATE TABLE a(a int,b int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/home/grid/a.txt' INTO TABLE a;
SELECT * FROM a;
查詢結果如圖2所示
圖2
(2)在kettle建立表輸入步驟,結果如圖3所示圖3
注意:這裡需要加上庫名test,否則查詢的是default庫。(3)點選預覽,顯示的資料如圖4所示
圖4
參考:https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2
http://stackoverflow.com/questions/25625088/pentaho-data-integration-with-hive-connection
http://blog.csdn.net/victor_ww/article/details/40041589