1. 程式人生 > >大資料:spark叢集搭建

大資料:spark叢集搭建

建立spark使用者組,組ID1000

groupadd -g 1000 spark

在spark使用者組下建立使用者ID 2000的spark使用者 獲取視訊中文件資料及完整視訊的夥伴請加QQ群:947967114
useradd -u 2000 -g spark spark

設定密碼

passwd spark

修改sudo許可權

chmod u+w /etc/sudoers

vi /etc/sudoers

找到

root ALL=(ALL) ALL

新增

spark ALL=(ALL) ALL

建立一個app目錄用來存放spark的軟體環境(jdk、scala、spark)

mkdir /app

修改這個檔案的屬組和屬主

chown -R spark:spark /app

建立soft

mkdir /app/soft

建立spark

mkdir /app/spark

建立/spark/work

mkdir -p /home/spark/work

改變/spark/work屬組和屬主

chown -R spark:spark /home/spark/work

切換使用者

su root

解壓JDK

cd /tmp/

tar zxvf jdk-8u192-linux-x64.tar.gz -C /app/soft/

如果沒有許可權首先使用chmod 777 -R /tmp修改許可權

cd /app/soft/

ll -a

配置/etc/profile

sudo vi /etc/profile,所有需要的配置都添加了

JAVA_HOME=/app/soft/jdk1.8.0_192

PATH=$JAVA_HOME/bin:$PATH:$HOME/bin

export PATH

讓配置生效 獲取視訊中文件資料及完整視訊的夥伴請加QQ群:947967114
source /etc/profile

安裝scala:

tar zxvf /tmp/scala-2.11.12.tgz -C /app/soft/

配置環境變數

sudo vi /etc/profile

JAVA_HOME=/app/soft/jdk1.8.0_192

SCALA_HOME=/app/soft/scala-2.11.12/

PATH=$JAVA_HOME/bin:$PATH:$HOME/bin:$SCALA_HOME/bin

export PATH

配置ssh無祕登入

ssh-keygen -t rsa

cd ~/

cd .ssh/

修改公鑰的名字

master節點:mv id_rsa.pub authorized_keys_master.pub

slave1節點:mv id_rsa.pub authorized_keys_slave1.pub

slave2節點:mv id_rsa.pub authorized_keys_slave2.pub

把slave1和slave2的公鑰給master

slave1節點:scp authorized_keys_slave1.pub [email protected]:/home/spark/.ssh/

slave2節點:scp authorized_keys_slave2.pub [email protected]:/home/spark/.ssh/

把三個節點的公鑰都寫在一個檔案中

cat authorized_keys_master.pub >> authorized_keys

cat authorized_keys_slave1.pub >> authorized_keys

cat authorized_keys_slave2.pub >> authorized_keys

檢視一下總的公鑰檔案

vi authorized_keys

把總的公鑰檔案authorized_keys給到slave1和slave2節點

scp authorized_keys [email protected]:/home/spark/.ssh

scp authorized_keys [email protected]:/home/spark/.ssh

修改authorized_keys的操作許可權,三個節點都需要修改

chmod 400 authorized_keys

驗證免密登入是否成功

ssh master

ssh slave1

ssh slave2

ssh master

安裝spark:

tar -zxf /tmp/spark-2.1.0-bin-hadoop2.6.gz -C /app/spark/

cd /app/spark/

ls

cd spark-2.1.0-bin-hadoop2.6/

配置環境變數:

vi /etc/profile

JAVA_HOME=/app/soft/jdk1.8.0_192

SCALA_HOME=/app/soft/scala-2.11.12/

SPARK_HOME=/app/spark/spark-2.1.0-bin-hadoop2.6

PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$JAVA_HOME/bin:$PATH:$HOME/bin:$SCALA_HOME/bin

export PATH

配置spark的核心檔案:

cd spark-2.1.0-bin-hadoop2.6/

cd conf/

配置slaves

mv slaves.template slaves

vi slaves 新增三個節點

master

slave1

slave2

配置spark-env.sh

cp spark-env.sh.template spark-env.sh

vi spark-env.sh

export JAVA_HOME=/app/soft/jdk1.8.0_192

export SCALA_HOME=/app/soft/scala-2.11.12

export SPARK_MASTER_IP=master

export SPARK_MASTER_PORT=7077

export SPARK_EXECUTOR_INSTANCES=1

export SPARK_WORKER_INSTANCES=1

export SPARK_WORKER_CORES=1

export SPARK_WORKER_MEMORY=1024M

export SPARK_MASTER_WEBUI=8080

export SPARK_CONF_DIR=/app/spark/spark-2.1.0-bin-hadoop2.6/conf/

把所有的節點的app的work和soft許可權都改成777:在所有的節點上執行 chmod 777 -R /app/soft 和chmod 777 -R /app/spark

scp -r /app/spark/ [email protected]:/app/

scp -r /app/soft/ [email protected]:/app/

到此spark叢集已經搭建完成:

開啟:start-all.sh獲取視訊中文件資料及完整視訊的夥伴請加QQ群:947967114
jps可以看到如下程序:

master節點:

3617 Worker

3507 Master

4156 Jps

slave1節點:

3361 Worker

3702 Jps

slave2節點:

3319 Worker

3647 Jps

開啟spark-shell驗證:

spark-shell --master spark://master:7077 --executor-memory 1024m --driver-memory 1024m

啟動之後會顯示如下內容:

18/11/29 16:13:46 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

18/11/29 16:13:47 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

Spark context Web UI available at http://192.168.0.10:4040

Spark context available as 'sc' (master = spark://master:7077, app id = app-20181129161336-0000).

Spark session available as 'spark'.

Welcome to

  ____              __

 / __/__  ___ _____/ /__

_\ \/ _ \/ _ `/ __/  '_/

// ./_,// //_\ version 2.1.0

  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_192)

Type in expressions to have them evaluated.

Type :help for more information.

scala>

就可以在>後面書寫spark程式碼了:

g NoSuchObjectException

Spark context Web UI available at http://192.168.0.10:4040

Spark context available as 'sc' (master = spark://master:7077, app id = app-20181129161336-0000).

Spark session available as 'spark'.

Welcome to

  ____              __

 / __/__  ___ _____/ /__

_\ \/ _ \/ _ `/ __/  '_/

// ./_,// //_\ version 2.1.0

  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_192)

Type in expressions to have them evaluated.

Type :help for more information.

scala> sc.textFile("/app/spark/spark-2.1.0-bin-hadoop2.6/README.md").flatMap(.split(" ")).map(x=>(x,1)).reduceByKey(+_).map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1)).take(10)

res0: Array[(String, Int)] = Array(("",71), (the,24), (to,17), (Spark,16), (for,12), (and,9), (##,8), (a,8), (can,7), (run,7))

scala>獲取視訊中文件資料及完整視訊的夥伴請加QQ群:947967114