1. 程式人生 > >基於虛擬機器的spark叢集開發環境的搭建

基於虛擬機器的spark叢集開發環境的搭建

1. 準備安裝包

Win10 64位系統

Vmware 10

Centos 6.4

jdk-7u80-linux-x64.rpm

Hadoop-2.7.1.tar.gz

scala-2.11.6.tgz

spark-2.0.1-bin-hadoop2.7.tgz

2. 安裝vmware workstations,新建虛擬機器master,一路enter

3. 安裝jdk

3.1. sudo rpm -ivh jdk-7u80-linux-x64.rpm

3.2. 設定java環境變數

Sudo gedit /etc/profile

在最後面增加:

#set java environment

export JAVA_HOME=/usr/java/jdk1.7.0_80  //注意若下載了其他版本,注意變通

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export PATH=$PATH:$JAVA_HOME/bin

3.3. 驗證java環境變數

echo $JAVA_HOME

4. 安裝hadoop

4.1. 解壓

tar -zxvf /usr/mywork/package/hadoop-2.7.1.tar.gz -C /usr/mywork/software

4.2. 配置環境變數

udo gedit /etc/profile

# set hadoop environment

export HADOOP_HOME=/usr/mywork/software/hadoop-2.7.1

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_LOG_DIR=$HADOOP_HOME/logs  #新建logs資料夾

export YARN_LOG_DIR=$HADOOP_LOG_DIR

生效:source /etc/profile

4.3. 驗證環境變數

echo $HADOOP_HOME

4.4. 建立資料夾(在hadoop home目錄下)

dfsdfs/namedfs/datatmp

4.5. 修改配置

4.5.1. 修改hadoop-env.shyarn-env.shmapred-env.sh

,新增

export JAVA_HOME=/usr/java/jdk1.7.0_80

4.5.2. 修改core-site.xml的內容

<configuration>

  <property>

<name>fs.defaultFS</name>

<value>hdfs://pmaster:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/mywork/software/hadoop-2.7.1/tmp</value>

</property>

</configuration>

4.5.3. 修改hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/usr/mywork/software/hadoop-2.7.1/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

 <value>/usr/mywork/software/hadoop-2.7.1/dfs/data</value>

</property>

</configuration>

4.5.4. 修改 Mapred-site.xml

<configuration>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

</configuration>

4.5.5. 修改Yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

  <property>

    <name>yarn.resourcemanager.hostname</name>

    <value>pmaster</value>

  </property>

  <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

  </property>

4.5.6. 修改slaves檔案(/etc/hadoop/slaves

pa

pb

4.6. 修改主機名

sudo hostname pmaster #暫時改變主機名,重啟失效

sudo gedit /etc/sysconfig/network開啟檔案,設定 hostname=pmaster #永久改變主機名

reboot #如果沒有執行第一步則需要重啟生效  

4.7. 繫結hostnameip

sudo gedit /etc/hosts

#開啟檔案,新增如下內容,先隨便填,等其他虛擬機器獲取了ip後再修改

192.168.184.129   pmaster

192.168.184.130   pa

192.168.184.131   pb

退出,ping pmaster 驗證

4.8. 關閉防火牆

sudo service iptables stop #關閉當前的防火牆

sudo service iptables status #檢視防火牆狀態,驗證是否關閉

sudo chkconfig iptables off #永久關閉防火牆

sudo chkconfig --list |grep iptables #檢查防火牆設定是否關閉

5. 克隆虛擬機器papb,按照上述方法修改主機名為papb,修/etc/hostsip

6. 配置ssh免密碼連線(三臺虛擬機器都開機)

6.1. 使用absolutetelnet登入master虛擬機器

6.2. 生成空密碼金鑰對

ssh-keygen -t rsa -P ‘’ -f /home/zls/.ssh/id_rsa

6.3. 在本機上生成authorized_keys,並驗證能否對本機進行SSH無密碼登陸

cd /home/zls/.ssh                     #使用當前使用者,無需切換到root使用者

cat id_rsa.pub >> authorized_keys #把公鑰追加到授權的key

chmod 600 authorized_keys #修改檔案的許可權

ssh localhost #能登入成功說明祕鑰是成功的

注:如果如上操作後再次ssh登入還是需要密碼,可以直接刪除.ssh資料夾下的檔案,重新生成祕鑰對。

6.4. 使用absolutetelnet登入 pa 虛擬機器,如上生成祕鑰

ssh-copy-id -i id_rsa.pub pmaster #把公鑰拷貝到pmaster並加入到master的授權key

6.5.  pb 6.4的對應操作

6.6. 登入 pmaster 虛擬機器

scp authorized_keys pa:/home/zls/.ssh/;scp authorized_keys pb:/home/zls/.ssh/

6.7. 通過absolutetelnet驗證叢集中各虛擬機器之間的無密碼登入

ssh pmaster;

ssh pa;

ssh pb;

ssh pmaster;

ssh pa;

ssh pb;

ssh pa;

ssh pmaster;

7. 格式化hadoop檔案系統及測試

7.1. 使用telnet登入pmaster賬戶

7.2. cd /usr/mywork/software/hadoop-2.7.1/bin

hadoop namenode -format(格式化)

7.3. start-dfs.sh

(在主節點上啟動了NameNode, SecondaryNameNode,在次節點上啟動了DataNode

7.4. start-yarn.sh

(在主節點上啟動了ResourceManager,在次節點上啟動了NodeManager

7.5. mr-jobhistory-daemon.sh start historyserver

(在相應的節點上啟動了ResourceManager

(停止:mr-jobhistory-daemon.sh stop historyserver

8. 安裝scala

8.1. 下載 scala-2.11.6.tgz

8.2. 安裝 sudo tar -zxf scala-2.11.6.tgz -C /usr/mywork/software

8.3. sudo /etc/profile,在末尾新增以下內容

#set scala environment

export SCALA_HOME=/usr/mywork/software/scala-2.11.6

export PATH=$PATH:$SCALA_HOME/bin

8.4. source /etc/profile

8.5. scala -version

9. 安裝spark

9.1. 解壓 sudo tar -zxf spark-2.0.1-bin-hadoop2.7.tgz -C /usr/mywork/software

9.2. profile檔案末尾新增路徑

sudo /etc/profile

#set spark environment

export SPARK_HOME=/usr/mywork/software/spark-2.0.1-bin-hadoop2.7

export PATH=$SPARK_HOME/bin:$PATH

source /etc/profile

9.3. 進入conf目錄,修改slaves檔案

cp slaves.template slaves

sudo vi slaves #開啟檔案,向檔案新增(登出localhost)

pa

pb

9.4. 修改spark-env.sh檔案

cp spark-env.sh.template spark-env.sh

sudo gedit spark-env.sh #開啟檔案,向檔案新增

export JAVA_HOME=/usr/java/jdk1.7.0_80

export SCALA_HOME=/usr/mywork/software/scala-2.11.6

export HADOOP_HOME=/usr/mywork/software/hadoop-2.7.1

export HADOOP_CONF_DIR=/usr/mywork/software/hadoop-2.7.1/etc/hadoop

export SPARK_MASTER_IP=pmaster

export SPARK_WORKER_MEMORY=1g

10. scala安裝資料夾、spark安裝資料夾和/etc/profile跨界點拷貝給papb

scp -r scala-2.11.6/ pb:/usr/mywork/software

scp -r spark-2.0.1-bin-hadoop2.7/ pb:/usr/mywork/software/

scp -r scala-2.11.6/ pa:/usr/mywork/software

scp -r spark-2.0.1-bin-hadoop2.7/ pa:/usr/mywork/software/

sudo scp -r /etc/profile pa:/etc/profile(切換到pa source一下/etc/profile)

sudo scp -r /etc/profile pb:/etc/profile(切換到pa source一下/etc/profile)

11. 啟動spark叢集

start-dfs.sh

start-yarn.sh

./start-all.sh(似乎要在sparksbin路徑下,否則會被當成hadoop叢集的start-all命令)

12. 測試spark

spark-shell

hadoop fs -put README.md /test(當前位於spark-home

var file = sc.textFile("hdfs://pmaster:9000/test/README.md")

var count = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

count.collect

13. pmaster虛擬機器上安裝idea

13.1. tar -zxf ideaIC-2016.2.4.tar.gz

13.2. mv idea-IC-162.2032.8/ /usr/mywork/software/

13.3. 啟動idea,安裝外掛scala-intellij-bin-2016.2.1.zip

14.新建scala-sbt專案,sbt下載很慢,可以修改

~/.IdeaIC2016.2/config/plugins/Scala/launcher/sbt-launch.jar/sbt/sbt.boot.properties

[repositories]

  local

  oschina: http://maven.oschina.net/content/groups/public/

  jcenter: http://jcenter.bintray.com/

  typesafe-ivy-releases:

  http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly

儲存後點擊idea右側的sbt project的同步按鈕,下載依賴包