1. 程式人生 > >大資料基礎(五)從零開始安裝配置Hadoop 2.7.2+Spark 2.0.0到Ubuntu 16.04

大資料基礎(五)從零開始安裝配置Hadoop 2.7.2+Spark 2.0.0到Ubuntu 16.04

raw to spark


0 install ubuntu 14.04.01 desktop x64


1 system基礎配置
《以下都是root模式》


1.3 root password
sudo passwd root


1.5 root登入選項
a.在terminal下輸入:
vi /usr/share/lightdm/lightdm.conf.d/50-ubuntu.conf
b.內容入下:
[SeatDefaults]
autologin-user=root #這個可以不寫,到時候選擇賬戶要登入
user-session=ubuntu
greeter-show-manual-login=true
c.在terminal下輸入
#gedit/root/.profile
d.用reboot命令重啟即可
e.如果彈出/root/.profile錯誤框,
將mesg n換成
tty -s && mesg n


1.6 Permit root ssh
$ sudo vi /etc/ssh/sshd_config
找到PermitRootLogin no一行,改為PermitRootLogin yes
 重啟 openssh server
$ sudo service ssh restart


1.1 install openssh-server
apt-get -y install openssh-server
reboot
防火牆關閉
ufw disable


1.2 vim
apt-get -y install vim-gtk






1.5 固定ip
最好在圖形介面,命令列改了之後就沒有可用網路選項了。
圖形介面:方法一
edit network
manual
ip 192.168.10.121
255.255.255.0
gateway 192.168.10.2
dns 192.168.10.2
其他不用設,重啟


命令列介面:方法二
vi /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
# The loopback network interface
auto lo
iface lo inet loopback


# The primary network interface
auto eth0
iface eth0 inet static
address 192.168.10.121
netmask 255.255.255.0
gateway 192.168.10.2


DNS
vi /etc/resolv.conf
nameserver 192.168.10.2


防止重啟後消失
vi /etc/resolvconf/resolv.conf.d/base
nameserver 192.168.10.2
執行
/etc/init.d/networking restart
可以上網


xx1.6 hosts
vi /etc/hosts
192.168.10.121  spark01
192.168.10.122  spark02
192.168.10.123  spark03
/etc/hostname
spark01
/etc/init.d/networking restart


1.4 teamviewer
dpkg -i teamviewer.xxx
dependancies
apt-get install -f
start teamviewer
$teamviewer


鎖屏
系統設定-亮度和鎖屏-關閉鎖屏-lock off


xx1.7 ssh免密碼登陸
su root
a.

[email protected]:/home/py# ssh-keygen -t rsa -P ''
b. [email protected]:~# ssh-copy-id -i "[email protected]"
看是否拷完全:vi /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDiMzpM0xIinRKMmC3HRxV91kl2cNpSkEFwJEm/P9NGRKdhqCtErA8Lo9rt+oI/7db0FrGgw15hJQnzr0Nht0raSFthb1HYttG0bpcfJJhp9BZmxNBSRlaWLHGe1B1NQL2micXaTa+SXCBFedrUIpNFSaJ7TCkCkTMJdsoWlrF8iE/IMCazK71jhD2k+MaomzdVfuAKR68tu2CK/D79+q9Apy8MusLhkrYmOPBPXtt72x1rVG7BqkCwz7AYqH39IJJCj0VSxdYSXnEMrnNzsA8kyAfnqz6hzyuerZfG7sp/u+4hcvgzCtO+pGfoy+m0lOGn+SJ0PBAhjiAZquI+ncGr
[email protected]

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCqH4QiuqYmA92JLE500L/02xiKUACbw1iBTZFGpthEYsAi31sWPWt6cE6ydEB7qklyMXX6fMkQ1/RhRrLVEuNho8YSwCMyoioLyXg2iue540/Ft12pifa30Buu+V1tTSwlpYBuQuyM9qhmXJ91OMGDochaj0E7MtOddLAqWxlxlsMeo+Bln/QzMPe0F99QasUHNUKAXWf77XOLGR4CMYhV/pVpoCuCLiO3sK/8yv6wJa61DrRtX9+/ANW2J4dXM7Iv4OebYlDdr0POSA0Qsu/pE71Wk2BKF52RLXGxsSAak/UgsjT4Ye3r73ZS7SCUWtRleI3NLZMM/3pQWLY7uKHH
[email protected]

有對方的主機和使用者名稱
【特別強調:虛擬機器本機也要給本機拷一份自己的,要不然可能報錯登陸不了本機!!!】


2 spark基礎環境




2.1 jdk 1.8
jdk-8u91-linux-x64.tar.gz官方下的是jdk-8u91-linux-x64.gz,重新命名成jdk-8u91-linux-x64.tar.gz
http://download.oracle.com/otn-pub/java/jdk/8u91-b14/jdk-8u91-linux-x64.tar.gz?AuthParam=1461460538_c9fec92cd12aba54d9b6cdefeb14a986
mkdir /usr/lib/java
tar -xvf jdk--8u91-linux-x64.tar.gz
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
java -version


2.2 scala

tar vxf scala-2.11.8.tgz 
mkdir /usr/lib/scala
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export PATH=$SCALA_HOME/bin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH


2.3 python
ananconda 2.7
bash Anaconda...
/server/anaconda2
source ~/.bashrc


2.4 sbt?
http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html

xx2.5 zookeeper
tar xvzf zookeeper.xxx /server/zookeeper/
export ZOOKEEPER_HOME=/server/zookeeper
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$SCALA_HOME/bin:${JAVA_HOME}/bin:$PATH
[email protected]:/server/zookeeper# mkdir data
[email protected]:/server/zookeeper# mkdir logs


配置myid檔案--唯一標識
[email protected]:/server/zookeeper/conf# echo 1 > /server/zookeeper/data/myid


[email protected]:/server/zookeeper/conf# echo 2 > /server/zookeeper/data/myid


[email protected]:/server/zookeeper/conf# echo 3 > /server/zookeeper/data/myid




[email protected]:/server/zookeeper/conf# cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
dataDir=/server/zookeeper/data
dataLogDir=/server/zookeeper/logs
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888

修改日誌位置【預設放在啟動目錄下,很礙眼】
改$ZOOKEEPER_HOME/bin目錄下的zkEnv.sh檔案,ZOO_LOG_DIR指定想要輸出到哪個目錄,
也可以繼續改ZOO_LOG4J_PROP,指定INFO,ROLLINGFILE的日誌APPENDER. 
ZOO_LOG_DIR="/server/zookeeper/logs"
參考:http://www.programgo.com/article/8705462646/




3 hadoop安裝


3.1 hadoop 2.7.2


http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
tar xvzf hadoop...gz
環境變數
vi ~/.bashrc 
export HADOOP_HOME=/server/hadoop
export ZOOKEEPER_HOME=/server/zookeeper
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$SCALA_HOME/bin:${JAVA_HOME}/bin:$PATH
配置:
修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh檔案 
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
修改$HADOOP_HOME/etc/hadoop/yarn-env.sh檔案 
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
修改core-site.xml
mkdir /server/hadoop/tmp
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://spark01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/server/hadoop/tmp</value>
    </property>
    <property>
        <name>hadoop.native.lib</name>
    </property>
</configuration>
修改hdfs-site.xml
mkdir dfs/name
mkdir dfs/data
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>spark01:50090</value>
                <description>The secondary namenode http server address and port.</description>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/server/hadoop/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/server/hadoop/dfs/data</value>
        </property>
        <property>
                <name>dfs.namenode.checkpoint.dir</name>
                <value>file:///server/hadoop/dfs/namesecondary</value>
                <description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to
merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
     </property>
</configuration>


修改yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>spark01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>


修改mapred-site.xml
<configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
</configuration>
修改slaves (slaves是指定子節點的位置可能有一個預設值localhost,將其改為主機master)
vi slaves
spark01
spark02
spark03


此時Hadoop+HA配置檔案已經配好,就差ssh免密碼登入+格式化Hadoop系統。 
等我們裝完所有軟體(Zookeeper+hbase),克隆機器後再進行ssh免密碼登入及Hadoop格式化。克隆後還需要更改每個節點的/etc/sysconfig/network中的hostname,以及更改master2中$HADOOP_HOME/etc/hadoop/yarn-site.xml檔案的yarn.resourcemanager.ha.id屬性值為rm2




4 spark


4.1 spark-1.6.2-bin-hadoop2.6.tgz (沒有hadoop2.7相應的spark-hadoop bin編譯好的版本,暫時用這個版,如果用spark-1.6.2.tgz,需要用sbt編譯,太慢了,網速慢幾個小時都不完。)
tar xvzf spark-1.6.2-bin-hadoop2.6.tgz
move to /server/spark
vi ~/.bashrc
export SPARK_MASTER_IP=192.168.10.121
export SPARK_WORKER_MEMORY=1g
export SPARK_HOME=/server/spark
export HADOOP_HOME=/server/hadoop
export HADOOP_CONF_DIR=/server/hadoop/etc/hadoop
export ZOOKEEPER_HOME=/server/zookeeper
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$SCALA_HOME/bin:${JAVA_HOME}/bin:$PATH
# If not running interactively, don't do anything
修改 slaves
[email protected]:/server/spark/conf# cp slaves.template 
刪除localhost
spark01
spark02
spark03
測試是否成功
spark-shell --version
1.6.2


【4.1.1 升級spark 2.0.0】
mv /server/spark/ /server/spark-1.6.2/ 
cd /server
tar xvzf spark-2.0.0-bin-hadoop2.7.tgz
mv /server/spark-2.0.0-bin-hadoop2.7 /server/spark
cd spark/conf
cp slaves.template slaves
vi slaves
spark01
spark02
spark03


4.2 安裝到其他機器或者拷貝映象到其他虛擬機器


拷貝ubuntu映象到兩個新的虛擬機器或者在其他兩臺伺服器安裝以上環境


修改新機器等ip hostname,其他配置不用改
新機器的ssh重新gen一下並分發,hadoop格式化
可以檢視vi /root/.ssh/authorized_keys檔案,就能發現每臺機器的金鑰。
如果不全,一定要拷貝全!!! 
!!!注意:
虛擬機器可能出現不同機器相同的id_rsa.pub裡hostname,雖然/etc/hosts,/etc/hostname是對的,說明還是拷貝映象有點問題,只要在id_rsa.pub改正確了,再拷貝。
!!!注意:
虛擬機器上,本機也要給本機ssh-copy-id,要不然還是錯


4.3格式化hdfs
在spark01也就是master上,執行hdfs namenode -format格式化namenode




5 啟動


5.1 啟動zookeeper叢集


啟動zookeeper叢集
###注意:嚴格按照下面的步驟


在 MasterWorker1 Worker2上啟動zookeeper
Master spark01
[email protected]:~# cd /server/zookeeper/bin/
[email protected]:/server/zookeeper/bin# ./zkServer.sh start


Worker1 spark02
[email protected]:~# cd /server/zookeeper/bin/
[email protected]:/server/zookeeper/bin# ./zkServer.sh start


Worker2 spark03
[email protected]:~# cd /server/zookeeper/bin/
[email protected]:/server/zookeeper/bin# ./zkServer.sh start


#檢視狀態:一個leader,兩個follower
[email protected]:/server/zookeeper/bin# ./zkServer.sh status
[email protected]:/server/zookeeper/bin# ./zkServer.sh status
[email protected]:/server/zookeeper/bin# ./zkServer.sh status
【注意:之前報not running error 是因為myid沒有配置】


5.2 啟動 Hadoop 叢集
以下5.2,5.3都在master節點即spark01上操作
5.2.1 啟動dfs
cd /server/hadoop/sbin/
[email protected]:/server/hadoop/sbin# ./start-dfs.sh
ssh有問題,還得輸密碼???
5.2.2啟動yarn
[email protected]:/server/hadoop/sbin# ./start-yarn.sh
5.2.3啟動job history【可選】


5.3 啟動 Spark叢集
cd /server/spark/sbin
[email protected]:/server/spark/sbin# ./start-all.sh
啟動history server【可選】
[email protected]:/server/spark/sbin#./start-history-server.sh


5.4 驗證
5.4.1 jps
spark01 6個必須的
quorumpeermain zookeeper的
namenode
master
resourcemanager
secondarynamenode
historyserver


[email protected]:/server/spark/sbin# jps
4194 Main
3314 ResourceManager
2678 NameNode
4199 Jps
3961 Master
3468 NodeManager
4093 Worker
2830 DataNode
2415 QuorumPeerMain
3039 SecondaryNameNode






spark02 4個必須的
datanode
nodemanager
worker
quorumpeermain




[email protected]:/server/zookeeper/bin# jps
2640 DataNode
3169 Jps
2486 QuorumPeerMain
3048 Worker
2798 NodeManager


[email protected]:/server/zookeeper/bin# jps
2817 NodeManager
2659 DataNode
2505 QuorumPeerMain
3194 Jps
3067 Worker




5.4.2
如果想在windows下看以下網址,需要在C:\Windows\System32\drivers\etc\host增加spark01,02,03的ip和hostname
192.168.10.121      spark01
192.168.10.122      spark02
192.168.10.123      spark03
【如無法儲存,可以先儲存到其他路徑,然後再拷過來覆蓋即可】
hadoop
http://spark01:50070/
http://spark01:8088
spark
http://spark01:8080/
http://spark01:18080/


5.4.3 spark submit
[email protected]:/server/spark/sbin# spark-submit --master yarn-cluster --class org.apache.spark.examples.SparkLR --name SparkLR /server/spark/lib/spark-examples-1.6.2-hadoop2.6.0.jar
結果:
16/07/25 06:22:32 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: 192.168.10.121
     ApplicationMaster RPC port: 0
     queue: default
     start time: 1469452521987
     final status: SUCCEEDED
     tracking URL: http://spark01:8088/proxy/application_1469451821296_0002/
     user: root
16/07/25 06:22:32 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1469451821296_0002
16/07/25 06:22:33 INFO util.ShutdownHookManager: Shutdown hook called
16/07/25 06:22:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0a4ae85e-0e77-4e57-bd46-a2371a6a20ee


5.4.4 hadoop submit
[email protected]:/home/alex# vi words
java
c++
hello
hello
python
java
java
java
c
c


[email protected]:/home/alex# hadoop fs -mkdir /data
[email protected]:/home/alex# hadoop fs -put words /data
[email protected]:/home/alex# hadoop jar /server/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /data /output
[email protected]:/home/alex# hadoop fs -cat /output/part-r-00000
c   2
c++ 1
hello   2
java    4
python  1






6 安裝ubuntu 16.04.01 desktop x64
6.0 linux配置參考1.1-1.7
ip 192.168.10.124 host spark04
6.1 安裝Anaconda2
在124上
bash Anaconda2-4.1.1-Linux-x86_64.sh
6.2 ssh免密碼
在121上
ssh-copy-id -i "[email protected]"
在124上
ssh-copy-id -i "[email protected]"
ssh-copy-id -i "[email protected]"
ssh-copy-id -i "[email protected]"
ssh-copy-id -i "[email protected]"
在122上
ssh-copy-id -i "[email protected]"
在123上
ssh-copy-id -i "[email protected]"
6.3 複製已配置等程式(都是不需要編譯等二進位制程式)
在121上
scp -r /usr/lib/java [email protected]:/usr/lib/java
scp -r /usr/lib/scala [email protected]8.10.124:/usr/lib/scala
scp -r /server/zookeeper/ [email protected]:/server/zookeeper/
scp -r /server/hadoop/ [email protected]:/server/hadoop/
scp -r /server/spark/ [email protected]:/server/spark/
#可選scp -r /server/spark-1.6.2/ [email protected]:/server/spark-1.6.2/
6.4 修改配置
6.4.1 ~/.bashrc
在124上修改或者scp複製
export SPARK_MASTER_IP=192.168.10.121
export SPARK_WORKER_MEMORY=1g
export SPARK_HOME=/server/spark
export HADOOP_HOME=/server/hadoop
export HADOOP_CONF_DIR=/server/hadoop/etc/hadoop
export ZOOKEEPER_HOME=/server/zookeeper
export JAVA_HOME=/usr/lib/java/jdk1.8.0_101
export JRE_HOME=${JAVA_HOME}/jre
export SCALA_HOME=/usr/lib/scala/scala-2.11.8
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$SCALA_HOME/bin:${JAVA_HOME}/bin:$PATH
[email protected]:/server# vi ~/.bashrc
[email protected]:/server# source ~/.bashrc
6.4.2 hosts
在121上
vi /etc/hosts
192.168.10.121  spark01
192.168.10.122  spark02
192.168.10.123  spark03
192.168.10.124  spark04
scp /etc/hosts [email protected]:/etc/hosts
scp /etc/hosts [email protected]:/etc/hosts
scp /etc/hosts [email protected]:/etc/hosts
6.4.3 zookeeper
在121上
vi /server/zookeeper/conf/zoo.cfg
dataDir=/server/zookeeper/data
dataLogDir=/server/zookeeper/logs
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888
server.4=spark04:2888:3888
scp /server/zookeeper/conf/zoo.cfg [email protected]:/server/zookeeper/conf/zoo.cfg
scp /server/zookeeper/conf/zoo.cfg [email protected]:/server/zookeeper/conf/zoo.cfg
scp /server/zookeeper/conf/zoo.cfg [email protected]:/server/zookeeper/conf/zoo.cfg
在124上
vi /server/zookeeper/data/myid 
1改成4
6.4.4 hadoop slaves
在121上
vi /server/hadoop/etc/hadoop/slaves 
spark01
spark02
spark03
spark04
方法一:
vi copy.sh
destfile=('[email protected]' '[email protected]' '[email protected]')
for dest in ${destfile[@]}; do
    scp /server/hadoop/etc/hadoop/slaves ${dest}:/server/hadoop/etc/hadoop/slaves
done
chmod +x copy.sh
bash copy.sh
方法二:
scp /server/hadoop/etc/hadoop/slaves [email protected]:/server/hadoop/etc/hadoop/slaves
scp /server/hadoop/etc/hadoop/slaves [email protected]:/server/hadoop/etc/hadoop/slaves
scp /server/hadoop/etc/hadoop/slaves [email protected]:/server/hadoop/etc/hadoop/slaves
6.4.5 hadoop格式化新節點的namenode
增加slave不需要再格式化namenode
格式化新節點的namenode:
在124上
cd $HADOOP_HOME/bin
./hdfs namenode -format
若彈出Re-format 選Y,重新格式化
http://www.cnblogs.com/simplestupid/p/4695644.html
在121上
在$HADOOP_HOME/sbin/下執行start-balancer.sh
具體可參考本文附錄


6.4.6 spark slaves
在121上
vi /server/spark/conf/slaves
spark01
spark02
spark03
spark04
scp /server/spark/conf/slaves [email protected]:/server/spark/conf/slaves
scp /server/spark/conf/slaves [email protected]:/server/spark/conf/slaves
scp /server/spark/conf/slaves [email protected]:/server/spark/conf/slaves
6.4.7
在windows上,在C:\Windows\System32\drivers\etc\host增加
192.168.10.124      spark04
###########################
可選1:
vi /server/spark-1.6.2/conf/slaves
spark01
spark02
spark03
spark04
scp /server/spark-1.6.2/conf/slaves [email protected]:/server/spark/conf/slaves
scp /server/spark-1.6.2/conf/slaves [email protected]:/server/spark/conf/slaves
scp /server/spark-1.6.2/conf/slaves [email protected]:/server/spark/conf/slaves




7 關機
7.1 zookeeper
cd /server/zookeeper/bin/
[email protected]:/server/zookeeper/bin# ./zkServer.sh stop
cd /server/zookeeper/bin/
[email protected]:/server/zookeeper/bin# ./zkServer.sh stop
cd /server/zookeeper/bin/
[email protected]:/server/zookeeper/bin# ./zkServer.sh stop


7.2 spark
[email protected]:/server/spark/sbin# ./stop-all.sh


7.3 hadoop
[email protected]:/server/hadoop/sbin# ./stop-all.sh


############################
參考
http://www.aboutyun.com/thread-17546-1-1.html
http://blog.csdn.net/onepiecehuiyu/article/details/45271493 同步時間、hbase等
http://blog.csdn.net/yeruby/article/details/49805121
http://my.oschina.net/amui/blog/610288
http://blog.chinaunix.net/uid-20682147-id-4220311.html
http://www.cnblogs.com/simplestupid/p/4695644.html 新節點安裝以及負載檢視
http://ribbonchen.blog.163.com/blog/static/118316505201421824512391/ 新增刪除節點


############################
如果ssh ubuntu 16 不行的話
在124上sudo service ssh restart
先在121-123上ssh spark04 yes 之後再啟動hadoop就不會出現yes/no的ssh登陸提示


如果發現某個worker沒啟動,先檢查master上的slaves檔案是否含有這個worker。


############################
新增節點等負載平衡
hadoop負載平衡
4.均衡block 
轉載:http://ribbonchen.blog.163.com/blog/static/118316505201421824512391/
http://www.cnblogs.com/simplestupid/p/4695644.html
[email protected] hadoop# ./bin/start-balancer.sh
1)如果不balance,那麼cluster會把新的資料都存放在新的node上,這樣會降低mapred的工作效率 
2)設定平衡閾值,預設是10%,值越低各節點越平衡,但消耗時間也更長 
[email protected] hadoop# ./bin/start-balancer.sh -threshold 5
3)設定balance的頻寬,預設只有1M/s
<property>
  <name>dfs.balance.bandwidthPerSec</name>  
  <value>1048576</value>  
  <description>  
    Specifies the maximum amount of bandwidth that each datanode   
    can utilize for the balancing purpose in term of   
    the number of bytes per second.   
  </description> 
</property>

####################################
實體機的spark日誌錯誤
啟動history-server時出現的。
在啟動時,去掉history,要不就在後邊加個路徑
具體配置在conf/spark-defaults.conf裡修改。