spark叢集與spark HA高可用快速部署 spark研習第一季
阿新 • • 發佈:2019-02-08
1、spark 部署
標籤: spark
0 apache spark專案架構
spark SQL -- spark streaming -- MLlib -- GraphX
0.1 hadoop快速搭建,主要利用hdfs儲存框架
下載hadoop-2.6.0,解壓,到etc/hadoop/目錄下
0.2 快速配置檔案
cat core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://worker1:9000
</value></property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.6.0/tmp</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>Should native hadoop libraries, if present, be used.
</description></property>
</configuration>
cat hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>worker1:50090
</value><description>The secondary namenode http server address and port.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-2.6.0/dfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///opt/hadoop-2.6.0/dfs/namesecondary</value>
<description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
</property>
</configuration>
cat hadoop-env.sh
export JAVA_HOME=/opt/jdk
export HADOOP_HOME=/opt/hadoop-2.6.0
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
cat mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
cat yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>worker1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
0.3 hadoop hdfs快速啟動測試
$ sbin/start-dfs.sh //開啟程序
jps
5212NameNode
5493SecondaryNameNode
5909Jps
5336DataNode
//如果沒有DataNode,檢視log/下的最新啟動情況
可能是因為hostname沒有改為worker1引起,每次重啟虛擬機器會遇到這種情況。
再次sbin/start-dfs.sh 如果namenode沒有起來
$ bin/hdfs namenode -format //格式化
最後瀏覽器檢視worker1:50070
1. Spark安裝及配置
1.1 執行環境配置
A.下載及配置JDK,Scala,sbt,Maven 到/opt 目錄下
JDK jdk-7u79-linux-x64.gz
Scala http://downloads.typesafe.com/scala/2.10.5/scala-2.10.5.tgz
Maven apache-maven-3.2.5-bin.tar.gz
SBT sbt-0.13.7.tgz
解壓 tar zxf jdk-7u79-linux-x64.gz
tar zxf scala-2.10.5.tgz
B.配置
vi ~/.bash_profile ##vi /etc/profile 以下皆替換
export JAVA_HOME=/opt/jdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export SCALA_HOME=/opt/scala-2.10.5
export PATH=$PATH:$SCALA_HOME/bin
$ source /etc/profile //source ~/.bash_profile
C.測試
$ java -version
java version "1.7.0_79"
Java(TM) SE RuntimeEnvironment(build 1.7.0_79-b15)
JavaHotSpot(TM)64-BitServer VM (build 24.79-b02, mixed mode)
$ scala -version
Scala code runner version 2.10.5--Copyright2002-2013, LAMP/EPFL
D.Maven,sbt配置
export MAVEN_HOME=/opt/apache-maven-3.2.5
export SBT_HOME=/opt/sbt
export PATH=$PATH:$SCALA_HOME/bin:$MAVEN_HOME/bin:$SBT_HOME/bin
$source /etc/profile
$ mvn --version
ApacheMaven3.2.5(12a6b3acb947671f09b81f49094c53f426d8cea1;2014-12-15T01:29:23+08:00)
Maven home:/opt/apache-maven-3.2.5
Java version:1.7.0_79, vendor:OracleCorporation
Java home:/opt/jdk
Default locale: en_US, platform encoding: UTF-8
OS name:"linux", version:"2.6.32-504.el6.x86_64", arch:"amd64", family:"unix"
$ sbt --version //warning '--'
sbt launcher version 0.13.7
1.2 Spark配置
A.下載Hadoop,Spark
$ tar zxf spark-1.4.0-bin-hadoop2.6.tgz
$ tar zxf hadoop-2.6.0.tar.gz
$ ll 檢視
B. 配置Hadoop,Spark的安裝目錄
vi ~/.bash_profile
export JAVA_HOME=/opt/jdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export SCALA_HOME=/opt/scala/scala-2.10.5
export SPARK_HOME=/opt/spark-1.4.0-bin-hadoop2.6
export HADOOP_HOME=/opt/hadoop-2.6.0
export HADOOP_CONF_DIR=/opt/hadoop-2.6.0/etc/hadoop
export MAVEN_HOME=/opt/apache-maven-3.2.5