Spark on Yarn提交任務緩慢
阿新 • • 發佈:2018-12-22
在使用 Spark on Yarn模式在叢集中提交任務的時候執行很緩慢,並且還報了一個WARN
使用叢集提交任務
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--executor-memory 1G \
--num-executors 1 \
/opt/spark-2.3.0-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.3.0.jar \
10
但是出現警告資訊:
WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
日誌在提交程式依賴的 jar 包,造成任務提交速度慢,在官網上看到解決辦法
To make Spark runtime jars accessible from YARN side, you can specify spark.yarn.archive or spark.yarn.jars.
For details please refer to Spark Properties. If neither spark.yarn.archive nor spark.yarn.jars is specified,
Spark will create a zip file with all jars under $SPARK_HOME/jars and upload it to the distributed cache.
大概是:要想在 yarn 節點訪問 spark 的 runtime jars,需要指定spark.yarn.jars。如果沒有指定,spark就會把$SPARK_HOME/jars/下的 jar 包上傳到分佈快取中去。
解決辦法:將$SPARK_HOME/jars/* 下spark執行依賴的jar上傳到hdfs上。
hadoop fs -mkdir /tmp/lib_jars
hadoop fs -put $SPARK_HOME/jars/* /tmp/lib_jars
在配置檔案$SPARK_HOME/conf/spark-defaults.conf 新增內容
spark.yarn.jars hdfs://master:9000/tmp/lib_jars/*
再次提交任務,執行,出現以下資訊。
2018-03-21 22:35:13 INFO Client:54 - Preparing resources for our AM container
2018-03-21 22:35:16 INFO Client:54 - Source and destination file systems are the same. Not copying hdfs://master:9000/tmp/lib_jars/JavaEWAH-0.3.2.jar
2018-03-21 22:35:16 INFO Client:54 - Source and destination file systems are the same. Not copying hdfs://master:9000/tmp/lib_jars/RoaringBitmap-0.5.11.jar
2018-03-21 22:35:16 INFO Client:54 - Source and destination file systems are the same. Not copying hdfs://master:9000/tmp/lib_jars/ST4-4.0.4.jar
2018-03-21 22:35:16 INFO Client:54 - Source and destination file systems are the same. Not copying hdfs://master:9000/tmp/lib_jars/activation-1.1.1.jar