1. 程式人生 > >Spark 2.4 之 standalone 叢集搭建

Spark 2.4 之 standalone 叢集搭建

本文參考官方文件: http://spark.apache.org/docs/latest/spark-standalone.html

1.預先搭建3臺hadoop 的叢集

SERVER INFO version
192.168.1.10 RHL6.8 & Hadoop 2.7.3
192.168.1.11 RHL6.8 & Hadoop 2.7.3
192.168.1.12 RHL6.8 & Hadoop 2.7.3

2. 在所有節點中安裝spark

此處可以參考 https://blog.csdn.net/chenxu_0209/article/details/84948302

3. 配置各個節點之間的SSH key

生成RSA KEY
[[email protected] ~]$  ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/spark/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase)
: Enter same passphrase again: Your identification has been saved in /home/spark/.ssh/id_rsa. Your public key has been saved in /home/spark/.ssh/id_rsa.pub. The key fingerprint is: 65:34:15:20:57:35:cf:c7:bf:8b:1c:94:46:d4:7f:34 [email protected] The key's randomart image is: +--[ RSA 2048]----+ | .
=++++ | | + .. E.| | o . .B| | o . . =| | S + o| | o .| | . . | | . o .| | o . | +-----------------+
將生成的RSA key 拷貝到 ~/.ssh/authorized_keys 儲存到每個節點中
***注意authorized_keys這個檔案不要手動建立 拷貝id_rsa.pub 即可
[[email protected] .ssh]$ vi authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA1EYdPds/v/1Qh8w5tBlpUcWMJJVBlBTzZK3Q/OhqGERdKmUu+9qw29VRB9+wtYX1vPl+t02zIGIYfZ8IjCaO56g1xc34NRF7Xe+w1H3EU3k5jwzsuqS8/BDz56QCia7gIZKJJAO3Xf+U9oJcin1paSWg2FmnXFbuyNEXaPptYVyjpSUJeZZvB50gqA46VOD3h3O1fGZ+d7WZ6aK6OvTgJdMaz8m0H1yCcF5vz08jKuDpVdBZX01nL8cFDz711FifFwZTMnSG5QnimrQ3FfCcyQwkQJQSqJ76v2H+CbWW5goA77AeV9GAs0Lqkk76eOj5/A1is0Yl45y2EZey0YClww== [email protected]

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAwKotX+y0FC4byiI9ItDR0jyjD/oPQHJpbTtFg4LUyGk0v9AaMo5b9hKWrX7nc989oKLz+lQYyoTEtJJ2zQ0JwNnVp1pBciQT92BIeRunvGuWnRKV19GZfLXy8xX8cTf+YYVSfkUtCxKrFgflVInRkNn2KeIsfTIg/dLVICkBEGXs0d0JfNgJBjA/dPV+1L3GAyOT0zJj62L3gE6a+1TcORmMQepHV5UZkW2RrF8rZxHULVoK3pcdHoMYQhhzJJBl+6ZcXRFKXAAElEKlcn6z7fGTqh1pvihuxYwUcjTZYDgNgdBwobZ/H5OP0ERoOgkbGaRCdq8pQBisNc6cj76oaw== [email protected]

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA1O9CfvVCXkE+5dQvsEuRfDvSf1h9xUQhk+LOMLS1BlKdxNmwhbcCb2E06ADjtOwSzldwFnZUxKnFIOyK5vJivKSzGlcOzVByEG58DNtfxqNQTSCxRsphAl8ZZA4sF1K5tYrFYca7iSJbJmdgw95Rmixty94tn8BJT8h2oePmnYgoARythj5BLmf2D6sXXGJuDLrjE9VabgPRUfpJOIr42XsdsnNZsbxLxiMP54xgXr4kpqdhjGvKOq61vbFLmIU3Wpbt+4IPONLolK5YdcM8mvS+JpQKTGslM0dkkqhAEBlizAr58OKQp0dmI9iTiFj82xRsrgP3lY1mlxaO45R4iw== [email protected]
測試 ssh
[[email protected] .ssh]$ ssh oggserver01 date
Fri Dec 21 05:15:04 PST 2018
[[email protected] .ssh]$ ssh oggserver02 date
Fri Dec 21 05:15:06 PST 2018

4. 修改$SPARK_HOME/conf/spark-env.sh

配置master和worker 節點資訊
SPARK_MASTER_HOST=hadoop01
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080

SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=512m
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCE=1
配置slave 節點列表
[[email protected] conf]$ mv slaves.template slaves.sh
[[email protected] conf]$ vi slaves.sh
oggserver01
oggserver02

5. 啟動master 節點

[[email protected] sbin]$ ./start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/spark/spark-2.4.0-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-hadoop01.out
此時可以通過web介面訪問: http://hadoop01:8080/
注意此時worker 列表為0 ,因為尚未啟動worker

在這裡插入圖片描述

6. 啟動slave 節點

[[email protected] sbin]$ ./start-slave.sh spark://hadoop01:7077
starting org.apache.spark.deploy.worker.Worker, logging to /home/spark/spark-2.4.0-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-oggserver01.out
[[email protected] sbin]$ ./start-slave.sh spark://hadoop01:7077
starting org.apache.spark.deploy.worker.Worker, logging to /home/spark/spark-2.4.0-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-oggserver02.out
此時可以通過web介面檢視worker 節點列表

在這裡插入圖片描述

6. SPARK-SHELL 連線standalone cluster

[[email protected] bin]$ ./spark-shell --master spark://hadoop01:7077
2018-12-21 05:55:51 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop01:4040
Spark context available as 'sc' (master = spark://hadoop01:7077, app id = app-20181221055610-0000).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.

scala>
編寫測試程式
scala> val rdd = sc.textFile("/user/hadoop/worddir/word.txt");
rdd: org.apache.spark.rdd.RDD[String] = /user/hadoop/worddir/word.txt MapPartitionsRDD[1] at textFile at <console>:24

scala> val tupleRDD = rdd.flatMap(line => {line.split(" ")
     |         .toList.map(word => (word.trim,1))
     |     });
tupleRDD: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[2] at flatMap at <console>:25

scala> val resultRDD :org.apache.spark.rdd.RDD[(String,Int)] =tupleRDD.reduceByKey((a,b)=> a + b);
resultRDD: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[3] at reduceByKey at <console>:25

scala> resultRDD.foreach(elm => println(elm._1+"="+elm._2));
圖形介面觀察輸出

在這裡插入圖片描述

在這裡插入圖片描述

如果遇到錯誤

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

2018-12-21 06:04:49 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
修改worker的記憶體 512m-> 1g