Kafka:ZK+Kafka+Spark Streaming集群環境搭建(三)安裝spark2.2.1
如何配置centos虛擬機請參考《Kafka:ZK+Kafka+Spark Streaming集群環境搭建(一)VMW安裝四臺CentOS,並實現本機與它們能交互,虛擬機內部實現可以上網。》
如何安裝hadoop2.9.0請參考《Kafka:ZK+Kafka+Spark Streaming集群環境搭建(二)安裝hadoop2.9.0》
安裝spark的服務器:
192.168.0.120 master 192.168.0.121 slave1 192.168.0.122 slave2 192.168.0.123 slave3
從spark官網下載spark安裝包:
官網地址:http://spark.apache.org/downloads.html
註意:上一篇文章中我們安裝了hadoop2.9.0,但是這裏沒有發現待下載spark對應的hadoop版本可選項中發現hadoop2.9.0,因此也只能選擇“Pre-built for Apache Hadoop 2.7 and later”。
這spark可選版本比較多,就選擇“2.2.1(Dec 01 2017)”。
選中後,此時帶下來的spark安裝包版本信息為:
下載“spark-2.2.1-bin-hadoop2.7.tgz”,上傳到master的/opt目錄下,並解壓:
[root@master opt]# tar -zxvf spark-2.2.1-bin-hadoop2.7.tgz [root@master opt]# ls hadoop-2.9.0 hadoop-2.9.0.tar.gz jdk1.8.0_171 jdk-8u171-linux-x64.tar.gz scala-2.11.0 scala-2.11.0.tgz spark-2.2.1-bin-hadoop2.7 spark-2.2.1-bin-hadoop2.7.tgz [root@master opt]#
配置Spark
[root@master opt]# lshadoop-2.9.0 hadoop-2.9.0.tar.gz jdk1.8.0_171 jdk-8u171-linux-x64.tar.gz scala-2.11.0 scala-2.11.0.tgz spark-2.2.1-bin-hadoop2.7 spark-2.2.1-bin-hadoop2.7.tgz [root@master opt]# cd spark-2.2.1-bin-hadoop2.7/conf/ [root@master conf]# ls docker.properties.template metrics.properties.template spark-env.sh.template fairscheduler.xml.template slaves.template log4j.properties.template spark-defaults.conf.template [root@master conf]# scp spark-env.sh.template spark-env.sh [root@master conf]# ls docker.properties.template metrics.properties.template spark-env.sh fairscheduler.xml.template slaves.template spark-env.sh.template log4j.properties.template spark-defaults.conf.template [root@master conf]# vi spark-env.sh
在spark-env.sh末尾添加以下內容(這是我的配置,你需要根據自己安裝的環境情況自行修改):
export SCALA_HOME=/opt/scala-2.11.0
export JAVA_HOME=/opt/jdk1.8.0_171
export HADOOP_HOME=/opt/hadoop-2.9.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/opt/spark-2.2.1-bin-hadoop2.7
SPARK_DRIVER_MEMORY=1G
註:在設置Worker進程的CPU個數和內存大小,要註意機器的實際硬件條件,如果配置的超過當前Worker節點的硬件條件,Worker進程會啟動失敗。
vi slaves
在slaves文件下填上slave主機名:
[root@master conf]# scp slaves.template slaves [root@master conf]# vi slaves
配置內容為:
#localhost
slave1
slave2
slave3
將配置好的spark-2.2.1-bin-hadoop2.7文件夾分發給所有slaves吧
scp -r /opt/spark-2.2.1-bin-hadoop2.7 spark@slave1:/opt/ scp -r /opt/spark-2.2.1-bin-hadoop2.7 spark@slave2:/opt/ scp -r /opt/spark-2.2.1-bin-hadoop2.7 spark@slave3:/opt/
註意:此時默認slave1,slave2,slave3上是沒有/opt/spark-2.2.1-bin-hadoop2.7,因此直接拷貝可能會出現無權限操作 。
解決方案,分別在slave1,slave2,slave3的/opt下創建spark-2.2.1-bin-hadoop2.7,並分配777權限。
[root@slave1 opt]# mkdir spark-2.2.1-bin-hadoop2.7 [root@slave1 opt]# chmod 777 spark-2.2.1-bin-hadoop2.7 [root@slave1 opt]#
之後,再次操作拷貝就有權限操作了。
啟動Spark
在spark安裝目錄下執行下面命令才行 , 目前的master安裝目錄在/opt/spark-2.2.1-bin-hadoop2.7
sbin/start-all.sh
此時,我使用非root賬戶(spark用戶名的用戶)啟動spark,出現master上spark無權限寫日誌的問題:
[spark@master opt]$ cd /opt/spark-2.2.1-bin-hadoop2.7 [spark@master spark-2.2.1-bin-hadoop2.7]$ sbin/start-all.sh mkdir: cannot create directory ‘/opt/spark-2.2.1-bin-hadoop2.7/logs’: Permission denied chown: cannot access ‘/opt/spark-2.2.1-bin-hadoop2.7/logs’: No such file or directory starting org.apache.spark.deploy.master.Master, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out /opt/spark-2.2.1-bin-hadoop2.7/sbin/spark-daemon.sh: line 128: /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out: No such file or directory failed to launch: nice -n 0 /opt/spark-2.2.1-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.master.Master --host master --port 7077 --webui-port 8080 tail: cannot open ‘/opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out’ for reading: No such file or directory full log in /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out slave1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave1.out slave3: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave3.out slave2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave2.out [spark@master spark-2.2.1-bin-hadoop2.7]$ cd .. [spark@master opt]$ su root Password: [root@master opt]# chmod 777 spark-2.2.1-bin-hadoop2.7 [root@master opt]# su spark [spark@master opt]$ cd spark-2.2.1-bin-hadoop2.7 [spark@master spark-2.2.1-bin-hadoop2.7]$ sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out slave2: org.apache.spark.deploy.worker.Worker running as process 3153. Stop it first. slave3: org.apache.spark.deploy.worker.Worker running as process 3076. Stop it first. slave1: org.apache.spark.deploy.worker.Worker running as process 3241. Stop it first. [spark@master spark-2.2.1-bin-hadoop2.7]$ sbin/stop-all.sh slave1: stopping org.apache.spark.deploy.worker.Worker slave3: stopping org.apache.spark.deploy.worker.Worker slave2: stopping org.apache.spark.deploy.worker.Worker stopping org.apache.spark.deploy.master.Master [spark@master spark-2.2.1-bin-hadoop2.7]$ sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out slave1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave1.out slave3: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave3.out slave2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave2.out
解決方案:給master的spark安裝目錄也分配777操作權限。
驗證 Spark 是否安裝成功
用jps
檢查,在 master 上應該有以下幾個進程:
$ jps 7949 Jps 7328 SecondaryNameNode 7805 Master 7137 NameNode 7475 ResourceManager
在 slave 上應該有以下幾個進程:
$jps 3132 DataNode 3759 Worker 3858 Jps 3231 NodeManager
進入Spark的Web管理頁面: http://192.168.0.120:8080
運行示例
本地方式兩線程運行測試:
[spark@master spark-2.2.1-bin-hadoop2.7]$ cd /opt/spark-2.2.1-bin-hadoop2.7 [spark@master spark-2.2.1-bin-hadoop2.7]$ ./bin/run-example SparkPi 10 --master local[2]
Spark Standalone 集群模式運行
[spark@master spark-2.2.1-bin-hadoop2.7]$ cd /opt/spark-2.2.1-bin-hadoop2.7 [spark@master spark-2.2.1-bin-hadoop2.7]$ ./bin/spark-submit > --class org.apache.spark.examples.SparkPi > --master spark://master:7077 \ > examples/jars/spark-examples_2.11-2.2.1.jar > 100
此時是可以從spark監控界面查看到運行狀況:
Spark on YARN 集群上 yarn-cluster 模式運行
[spark@master spark-2.2.1-bin-hadoop2.7]$ cd /opt/spark-2.2.1-bin-hadoop2.7 [spark@master spark-2.2.1-bin-hadoop2.7]$ ./bin/spark-submit > --class org.apache.spark.examples.SparkPi > --master yarn-cluster > /opt/spark-2.2.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.1.jar > 10
註意:Spark on YARN 支持兩種運行模式,分別為yarn-cluster和yarn-client,具體的區別可以看這篇博文,從廣義上講,yarn-cluster適用於生產環境;而yarn-client適用於交互和調試,也就是希望快速地看到application的輸出。
Kafka:ZK+Kafka+Spark Streaming集群環境搭建(三)安裝spark2.2.1