cdh5.7.0偽分散式叢集之spark-2.2.0安裝
基本環境及軟體:
軟體版本 | 軟體包 |
---|---|
centos-6.4 | |
JDK-1.8 | jdk-8u191-linux-x64.tar.gz |
hadoop-2.6.0 | hadoop-2.6.0-cdh5.7.0.tar.gz |
scala-2.11.8 | scala-2.11.8.tgz |
spark-2.2.0 | spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz |
軟體安裝包官網下載地址 :http://archive-primary.cloudera.com/cdh5/cdh/5/
安裝scala
1.將scala安裝包scala-2.11.8.tgz上傳到虛擬機器的/usr/local/app目錄下
2.對scala-2.11.8.tgz進行解壓縮:
# tar -zxvf scala-2.11.8.tgz
3.配置scala相關的環境變數
# vim ~/.bashrc
#set sscala environment
export SCALA_HOME=/usr/local/app/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
# source ~/.bashrc
4.檢視scala是否安裝成功:
# scala -version
出現下面的結果,表示安裝成功
安裝spark
注意:安裝之前需要先安裝Hadoop,請參考另一篇文章:https://blog.csdn.net/weixin_39689084/article/details/84548507
下載saprk有兩種方式:
- 第一種方式:下載可執行tar包,直接解壓
- 第二種方式:下載原始碼包,編譯後解壓
這兒用到的事第二種方式,具體怎麼編譯在這兒不做詳細介紹,自行百度,下面介紹按章方法:
local模式搭建
1.將編譯好的saprk安裝包spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz 上傳到虛擬機器的/usr/local/app目錄下
2、解壓縮spark包:
# tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz
3、重新命名spark目錄:
# mv spark-2.2.0-bin-2.6.0-cdh5.7.0/ spark-2.2.0
4、修改spark環境變數
# vim ~/.bashrc
#set spark environment
export SPARK_HOME=/usr/local/app/spark-2.2.0
export PATH=$PATH:$SPARK_HOME/bin
# source ~/.bashrc
5.測試:
# spark-shell --master local[2]
standalone模式搭建
1、進入到/usr/local/app/spark-2.2.0/conf目錄下
# cd /usr/local/app/spark-2.2.0/conf
# cp spark-env.sh.template spark-env.sh
# vi spark-env.sh
在spark-env.sh檔案最後進行配置
SPARK_MASTER_HOST=hadoop000
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
配置解釋
SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
SPARK_WORKER_CORES, to set the number of cores to use on this machine
SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
SPARK_WORKER_INSTANCES, to set the number of worker processes per node
啟動:
# cd /usr/local/app/spark-2.2.0/sbin
# ./start-all.sh
測試:
# spark-shell --master spark://bigdata:7077
頁面訪問