1. 程式人生 > >學習筆記:從0開始學習大資料-12. spark安裝部署

學習筆記:從0開始學習大資料-12. spark安裝部署

為了教學方便,考慮ALL IN ONE,一臺虛擬機器構建整個實訓環境,因此是偽分散式搭建spark
 環境:

  hadoop2.6.0-cdh5.15.1

  jdk1.8

  centos7 64位

1. 安裝scala環境

版本是scala-2.12.7,官網下載地址http://www.scala-lang.org/download/

scala-2.12.7.tgz 
 
tar -zxvf scala-2.12.7.tgz

nano /etc/profile

export SCALA_HOME=/home/linbin/software/scala-2.12.7
export PATH=$PATH:$SCALA_HOME/bin

[[email protected] bin]# scala -version
Scala code runner version 2.12.7 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

2. 下載spark
wget http://archive.cloudera.com/cdh5/cdh/5/spark1.6.0-cdh5.15.1.tar.gz

3.解壓 tar -zxvf spark1.6.0-cdh5.15.1.tar.gz

4.進入spark/conf目錄,把 spark-env.sh.template檔案重新命名為 spark-env.sh,並且在檔案末尾加上如下配置
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64
export SCALA_HOME=/home/linbin/software/scala-2.12.7
export SPARK_MASTER_IP=centos7
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/home/linbin/software/hadoop-2.6.0-cdh5.15.1/etc/hadoop
export SPARK_DIST_CLASSPATH=$(/home/linbin/software/hadoop-2.6.0-cdh5.15.1/bin/hadoop classpath)

5. slaves.template檔案重新命名為slaves,檔案的預設內容是localhost,把localhost刪除,並新增本機名字centos7

6. 提示log4j類未發現,在其它元件下找到並複製到
sparkXX/lib/slf4j-api-1.7.5.jar

7. worker啟動提示如下錯誤  
Exception in thread "main" java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/ObjectMapper
解決辦法:
把 /home/linbin/software/hadoop-2.6.0-cdh5.15.1/share/hadoop/mapreduce1/lib/目錄下的 
jackson-databind-2.2.3.jar 
jackson-core-2.2.3.jar 
jackson-annotations-2.2.3.jar
 複製到/home/linbin/software/hadoop-2.6.0-cdh5.15.1/share/hadoop/common/lib 目錄

8.啟動  

start-dfs.sh

start-yarn.sh
sparkXX/sbin/start-all.sh    //spark 的start-all.sh  和 hdfs 的 start-all.sh  不要搞混了

看到master和Worker 說明spark啟動成功
9.啟動spark-shell
出現 java.lang.NoClassDefFoundError: parquet/hadoop/ParquetOutputCommitter
解決:
在sqoop目錄找到
sqoop2-1.99.5-cdh5.15.1/server/webapps/sqoop/WEB-INF/lib/parquet-hadoop-1.5.0-cdh5.15.1.jar
複製到/home/linbin/software/hadoop-2.6.0-cdh5.15.1/share/hadoop/common/lib 目錄

最後,成功啟動

10.通過http監控spark  master 和worker

 

有價值的參考文章:
https://www.jianshu.com/p/a0c38dc46b89   搭建Spark所遇過的坑