1. 程式人生 > >03. 搭建Spark集群(CentOS7+Spark2.1.1+Hadoop2.8.0)

03. 搭建Spark集群(CentOS7+Spark2.1.1+Hadoop2.8.0)

org 配置文件 java profile path 同時 hadoop2 scp 運行

一、下載安裝scala

1、官網下載

2、spar01和02都建立/opt/scala目錄,解壓tar -zxvf scala-2.12.8.tgz

3、配置環境變量

  vi /etc/profile 增加一行

  export SCALA_HOME=/opt/scala/scala-2.12.8

  同時把hadoop的環境變量增加進去,完整版是:

export JAVA_HOME=/opt/java/jdk1.8.0_191
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib"
export SCALA_HOME=/opt/scala/scala-2.12.8

export CLASSPATH=$:CLASSPATH:${JAVA_HOME}/lib/
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${SCALA_HOME}/bin:$PATH

  然後source /etc/profile

4、驗證

  scala -version

5、同步spark02配置文件

  scp /etc/profile spark02:/etc

二、下載安裝spark

1、下載,解壓,同scala,建立/opt/spark目錄

2、配置環境變量

export SPARK_HOME=/opt/spark/spark-2.4.0-bin-hadoop2.7

完整版更新:

export JAVA_HOME=/opt/java/jdk1.8.0_191
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib"
export SCALA_HOME=/opt/scala/scala-2.12.8
export SPARK_HOME=/opt/spark/spark-2.4.0-bin-hadoop2.7

export CLASSPATH=$:CLASSPATH:${JAVA_HOME}/lib/
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${SCALA_HOME}/bin:$PATH

source /etc/profile

scp /etc/profile spark02:/etc

3、配置conf下文件

cp spark-env.sh.template spark-env.sh

cp slaves.template slaves

vi spark-env.sh

export SCALA_HOME=/opt/scala/scala-2.12.8
export JAVA_HOME=/opt/java/jdk1.8.0_191
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/spark/spark-2.4.0-bin-hadoop2.7
export SPARK_MASTER_IP=spark01
export SPARK_EXECUTOR_MEMORY=2G

vi slaves

spark02

同步到spark02

scp /opt/spark/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh spark02:/opt/spark/spark-2.4.0-bin-hadoop2.7/conf/
scp /opt/spark/spark-2.4.0-bin-hadoop2.7/conf/slaves spark02:/opt/spark/spark-2.4.0-bin-hadoop2.7/conf/

三、測試spark

  因為spark是依賴於hadoop提供的分布式文件系統的,所以在啟動spark之前,先確保hadoop在正常運行。

  在hadoop正常運行的情況下,在spark01(也就是hadoop的namenode,spark的marster節點)上執行命令:

  cd /opt/spark/spark-2.4.0-bin-hadoop2.7/sbin

  執行啟動腳本:./start-all.sh

  在瀏覽器裏訪問Mster機器,我的Spark集群裏Master機器是spark01,IP地址是192.168.2.245,訪問8080端口,URL是:http://192.168.2.245:8080/

  用local模式運行一個計算圓周率的Demo。按照下面的步驟來操作。

  第一步,進入到Spark的根目錄,也就是執行下面的腳本:

  ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.4.0.jar

  yarn-client模式:

  ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client examples/jars/spark-examples_2.11-2.4.0.jar

03. 搭建Spark集群(CentOS7+Spark2.1.1+Hadoop2.8.0)