1. 程式人生 > >用cloudera manager安裝impala全過程以impala、hive、Spark效能比較-(三)cloudera manager 安裝impala成功並對impala、hive進行簡單測試

用cloudera manager安裝impala全過程以impala、hive、Spark效能比較-(三)cloudera manager 安裝impala成功並對impala、hive進行簡單測試

Cloudera manager安裝impala除了第一篇文章提到的條件:1.需要安裝centos6.2系統。2.CDH4.1.0以上版本3.要在叢集每個節點安裝hive4.hive的元資料庫要使用mysql5.每臺主機hosts檔案中都加入所有機器的IP地址和主機名的對應關係。還需要關閉ipv6.否則cloudera manager無法最終識別主機。

關閉ipv6後,登陸cloudera manager頁面顯示有三臺管理的主機。OKcloudera manager已經工作正常。點選‘服務’選項,選擇角色分配,為每一臺主機分配角色。Impala不在初始的服務內,等所有服務啟動正常後,需要再新增impala

服務。啟動impala服務後可以登入叢集中任意一臺主機,啟動impala-shell執行查詢命令。Impala要求hive使用mysql元資料庫,但是用cloudera manager安裝impala後,雖然在impalahive metadata中設定了mysql資料庫,但到叢集中看配置檔案並沒有更改。於是我就手動更改了配置檔案,在impala-shell中執行select tables可以顯示hive中的表了。在impala-shell中執行查詢只是輸出結果,不顯示執行時間,不便於和hive比較。可以使用

$ time impala-shell - -impalad=200.200.200.11:21000 –q ‘select * from tt’

主機地址測試表

$ time hive –e ‘selcet * from tt’

進行比較時間。一個簡單的比較結果如下:

Impala

time impala-shell --impalad=200.200.200.11:21000 -q'select id from tt'

real   0m4.921s

user   0m0.072s

sys    0m0.042s

hive

time hive -e 'select id from tt'

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties

Hive history file=/tmp/hdfs/hive_job_log_hdfs_201212111430_946199434.txt

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_201212111359_0001, Tracking URL = http://big1-1:50030/jobdetails.jsp?jobid=job_201212111359_0001

Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=big1-1:8021 -kill job_201212111359_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2012-12-11 14:30:44,633 Stage-1 map = 0%,  reduce = 0%

2012-12-11 14:30:49,716 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.92 sec

2012-12-11 14:30:50,735 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.92 sec

2012-12-11 14:30:51,746 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.92 sec

2012-12-11 14:30:52,761 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.92 sec

MapReduce Total cumulative CPU time: 920 msec

Ended Job = job_201212111359_0001

MapReduce Jobs Launched:

Job 0: Map: 1   Cumulative CPU: 0.92 sec   HDFS Read: 0 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 920 msec

OK

Time taken: 36.364 seconds

real   0m40.248s

user   0m15.590s

sys    0m2.638s

可以看出impalahive快很多。

這只是一個初步認識,後面我們會用一些幾G的資料在hive,impala,spark上分別跑。做更詳盡的對比。以後有時間再寫。