1. 程式人生 > >入門大資料---Hive計算引擎Tez簡介和使用

入門大資料---Hive計算引擎Tez簡介和使用

# 一、前言 Hive預設計算引擎時MR,為了提高計算速度,我們可以改為Tez引擎。至於為什麼提高了計算速度,可以參考下圖: ![image-20200719151044959](https://blog-1259495827.cos.ap-beijing.myqcloud.com/Hive/image-20200719151044959.png) 用Hive直接編寫MR程式,假設有四個有依賴關係的MR作業,上圖中,綠色是Reduce Task,雲狀表示寫遮蔽,需要將中間結果持久化寫到HDFS。 Tez可以將多個有依賴的作業轉換為一個作業,這樣只需寫一次HDFS,且中間節點較少,從而大大提升作業的計算效能。 # 二、安裝包準備 1)下載tez的依賴包:http://tez.apache.org 2)拷貝apache-tez-0.9.1-bin.tar.gz到hadoop102的/opt/module目錄 ```shell [root@hadoop102 module]$ ls ``` apache-tez-0.9.1-bin.tar.gz 3)解壓縮apache-tez-0.9.1-bin.tar.gz ```shell [root@hadoop102 module]$ tar -zxvf apache-tez-0.9.1-bin.tar.gz ``` 4)修改名稱 ```shell [root@hadoop102 module]$ mv apache-tez-0.9.1-bin/ tez-0.9.1 ``` # 三、在Hive中配置Tez 1)進入到Hive的配置目錄:/opt/module/hive/conf ```shell [root@hadoop102 conf]$ pwd /opt/module/hive/conf ``` 2)在hive-env.sh檔案中新增tez環境變數配置和依賴包環境變數配置 ```shell [root@hadoop102 conf]$ vim hive-env.sh ``` 新增如下配置 ```xml # Set HADOOP_HOME to point to a specific hadoop install directory export HADOOP_HOME=/opt/module/hadoop-2.7.2 # Hive Configuration Directory can be controlled by: export HIVE_CONF_DIR=/opt/module/hive/conf # Folder containing extra libraries required for hive compilation/execution can be controlled by: export TEZ_HOME=/opt/module/tez-0.9.1 #是你的tez的解壓目錄 export TEZ_JARS="" for jar in `ls $TEZ_HOME |grep jar`; do export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar done for jar in `ls $TEZ_HOME/lib`; do export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar done export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS ``` 3)在hive-site.xml檔案中新增如下配置,更改hive計算引擎 ```xml hive.execution.engine tez ``` # 四、配置Tez 1)在Hive的/opt/module/hive/conf下面建立一個tez-site.xml檔案 ```shell [root@hadoop102 conf]$ pwd /opt/module/hive/conf [root@hadoop102 conf]$ vim tez-site.xml ``` 新增如下內容 ```xml tez.lib.uris ${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib tez.lib.uris.classpath
${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib
tez.use.cluster.hadoop-libs true tez.history.logging.service.class org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
``` # 五、上傳Tez到叢集 1)將/opt/module/tez-0.9.1上傳到HDFS的/tez路徑 ```shell [root@hadoop102 conf]$ hadoop fs -mkdir /tez [root@hadoop102 conf]$ hadoop fs -put /opt/module/tez-0.9.1/ /tez [root@hadoop102 conf]$ hadoop fs -ls /tez /tez/tez-0.9.1 ``` # 六、測試 1)啟動Hive ```shell [root@hadoop102 hive]$ bin/hive ``` 2)建立LZO表 ```shell hive (default)>
create table student( id int, name string); ``` 3)向表中插入資料 ```shell hive (default)> insert into student values(1,"zhangsan"); ``` 4)如果沒有報錯就表示成功了 ```shell hive (default)> select * from student; 1 zhangsan ``` # 七、小結 1)執行Tez時檢查到用過多記憶體而被NodeManager殺死程序問題: ``` Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1546781144082_0005 failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with exitCode: -103 For more detailed output, check application tracking page:http://hadoop103:8088/cluster/app/application_1546781144082_0005Then, click on links to logs of each attempt. Diagnostics: Container [pid=11116,containerID=container_1546781144082_0005_02_000001] is running beyond virtual memory limits. Current usage: 216.3 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container. ``` 這種問題是從機上執行的Container試圖使用過多的記憶體,而被NodeManager kill掉了。 ``` [摘錄] The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want. ``` **解決方法:** 方案一:或者是關掉虛擬記憶體檢查。我們選這個,修改yarn-site.xml ```shell yarn.nodemanager.vmem-check-enabled
false
``` 方案二:mapred-site.xml中設定Map和Reduce任務的記憶體配置如下:(value中實際配置的記憶體需要根據自己機器記憶體大小及應用情況進行修改) ```xml   mapreduce.map.memory.mb   1536   mapreduce.map.java.opts   -Xmx1024M   mapreduce.reduce.memory.mb   3072   mapreduce.reduce.java.opts   -Xmx2560M ``` [系列傳送門](https://www.cnblogs.com/shun7man/p/115180