1. 程式人生 > >cdh5.7.0偽分散式叢集之spark-2.2.0安裝

cdh5.7.0偽分散式叢集之spark-2.2.0安裝

基本環境及軟體:

軟體版本 軟體包
centos-6.4  
JDK-1.8 jdk-8u191-linux-x64.tar.gz
hadoop-2.6.0 hadoop-2.6.0-cdh5.7.0.tar.gz
scala-2.11.8 scala-2.11.8.tgz
spark-2.2.0 spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz

軟體安裝包官網下載地址 :http://archive-primary.cloudera.com/cdh5/cdh/5/ 

安裝scala

1.將scala安裝包scala-2.11.8.tgz上傳到虛擬機器的/usr/local/app目錄下

2.對scala-2.11.8.tgz進行解壓縮:

# tar -zxvf scala-2.11.8.tgz

3.配置scala相關的環境變數

# vim ~/.bashrc
#set sscala environment
export SCALA_HOME=/usr/local/app/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
# source ~/.bashrc

4.檢視scala是否安裝成功:

# scala -version

出現下面的結果,表示安裝成功

 

安裝spark

注意:安裝之前需要先安裝Hadoop,請參考另一篇文章:https://blog.csdn.net/weixin_39689084/article/details/84548507

下載saprk有兩種方式:

  • 第一種方式:下載可執行tar包,直接解壓
  • 第二種方式:下載原始碼包,編譯後解壓

這兒用到的事第二種方式,具體怎麼編譯在這兒不做詳細介紹,自行百度,下面介紹按章方法:

local模式搭建

1.將編譯好的saprk安裝包spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz  上傳到虛擬機器的/usr/local/app目錄下
2、解壓縮spark包:

# tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz

3、重新命名spark目錄:

# mv spark-2.2.0-bin-2.6.0-cdh5.7.0/ spark-2.2.0

4、修改spark環境變數

# vim ~/.bashrc
#set spark environment
export SPARK_HOME=/usr/local/app/spark-2.2.0
export PATH=$PATH:$SPARK_HOME/bin
# source ~/.bashrc

5.測試:

# spark-shell --master local[2]

 

standalone模式搭建

1、進入到/usr/local/app/spark-2.2.0/conf目錄下

# cd /usr/local/app/spark-2.2.0/conf

# cp spark-env.sh.template spark-env.sh

# vi spark-env.sh

 在spark-env.sh檔案最後進行配置

SPARK_MASTER_HOST=hadoop000
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1

 配置解釋

SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
SPARK_WORKER_CORES, to set the number of cores to use on this machine
SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
SPARK_WORKER_INSTANCES, to set the number of worker processes per node

啟動:

# cd /usr/local/app/spark-2.2.0/sbin

# ./start-all.sh 

 

測試:

# spark-shell --master spark://bigdata:7077

頁面訪問