1. 程式人生 > >【原創】大數據基礎之Spark(9)spark部署方式yarn/mesos

【原創】大數據基礎之Spark(9)spark部署方式yarn/mesos

cli 原創 container 大數據 per containe ber exe 調整

1 下載 https://spark.apache.org/downloads.html

$ wget http://mirrors.shu.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

2 解壓

$ tar xvf spark-2.4.0-bin-hadoop2.7.tgz
$ cd spark-2.4.0-bin-hadoop2.7

3 配置環境變量SPARK_HOME

$ export SPARK_HOME=/path/to/spark-2.4.0-bin-hadoop2.7

4 啟動

以spark-sql為例

4.1 spark on yarn

只需要配置環境變量 HADOOP_CONF_DIR

$ bin/spark-sql --master yarn

更多參數

--deploy-mode cluster
--driver-memory 4g
--driver-cores 1
--executor-memory 2g
--executor-cores 1
--num-executors 1
--queue thequeue

4.2 spark on mesos

$ bin/spark-sql --master mesos://zk://172.19.28.186:2181,172.19.28.188:2181,172.19.28.190:2181/mesos

更多參數

--deploy-mode cluster
--supervise
--executor-memory 20G
--executor-cores 1
--total-executor-cores 100

註意此時沒有--num-executors參數,間接配置方法 --num-executors = --total-executor-cores / --executor-cores

Executor memory: spark.executor.memory
Executor cores: spark.executor.cores
Number of executors: spark.cores.max/spark.executor.cores

註意:spark on yarn 有可能啟動報錯

19/02/25 17:54:20 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!

查看nodemanager日誌發現原因

2019-02-25 17:54:19,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=48342,containerID=container_1551078668160_0012_02_000001] is running beyond virtual memory limits. Current usage: 380.9 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.

需要調整yarn-site.xml配置

<property>

<name>yarn.nodemanager.vmem-check-enabled</name>

<value>false</value>

</property>

or

<property>

<name>yarn.nodemanager.vmem-pmem-ratio</name>

<value>4</value>

</property>

【原創】大數據基礎之Spark(9)spark部署方式yarn/mesos