spark學習-執行spark on yarn 例子和檢視日誌.
阿新 • • 發佈:2018-12-22
要通過web頁面檢視執行日誌,需要啟動兩個東西
hadoop啟動jobhistoryserver和spark的history-server.
相關配置檔案:
etc/hadoop/mapred-site.xml
<!--配置jobhistory的地址和web管理地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>spark-master:10020</value>
</property >
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>spark-master:19888</value>
</property>
yarn-site.xml
<!-- 是否開啟聚合日誌 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value >true</value>
</property>
<!-- 配置日誌伺服器的地址,work節點使用 -->
<property>
<name>yarn.log.server.url</name>
<value>http://spark-master:19888/jobhistory/logs/</value>
</property>
<!-- 配置日誌過期時間,單位秒 -->
<property >
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
spark-defaults.conf
spark.eventLog.enabled=true
spark.eventLog.compress=true
#儲存在本地
#spark.eventLog.dir=file://usr/local/hadoop-2.7.6/logs/userlogs
#spark.history.fs.logDirectory=file://usr/local/hadoop-2.7.6/logs/userlogs
#儲存在hdfs上
spark.eventLog.dir=hdfs://spark-master:9000/tmp/logs/root/logs
spark.history.fs.logDirectory=hdfs://spark-master:9000/tmp/logs/root/logs
spark.yarn.historyServer.address=spark-master:18080
啟動
1.首先啟動 hadoop的jobhistory
[[email protected] hadoop-2.7.6]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.7.6/logs/mapred-root-historyserver-spark-master.out
2.啟動spark的history-server
[[email protected] spark-2.3.0]# sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/spark-2.3.0/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-spark-master.out
如果配置正確,啟動完成之後,就可以訪問18080 和19888
效果圖:
執行測試例子
spark執行機制有機制,基於local模式,standalone,和yarn模式.
三種模式的命令有一些不一樣.
local
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[4] --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/jars/spark-examples_2.11-2.3.0.jar 1
standalone
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark-master:6066 --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/jars/spark-examples_2.11-2.3.0.jar 1
yarn模式
bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --driver-memory 1g --executor-memory 1g examples/jars/spark-examples_2.11-2.3.0.jar 1
我們這裡面探討的是 spark on yarn模式下,檢視日誌的流程.