Hadoop-2.8.0之分散式叢集(HA架構)搭建
1、安裝前準備
①、叢集規劃:
主機名稱 | 使用者 | 主機IP | 安裝軟體 | 執行程序 |
centos71 | hzq | 192.168.1.201 | jdk、hadoop | NameNode、DFSZKFailoverController(zkfc) |
centos72 | hzq | 192.168.1.202 | jdk、hadoop | NameNode、DFSZKFailoverController(zkfc) |
centos73 | hzq | 192.168.1.203 | jdk、hadoop | ResourceManager |
centos74 | hzq | 192.168.1.204 | jdk、hadoop | ResourceManager |
centos75 | hzq | 192.168.1.205 | jdk、hadoop | DataNode、NodeManager、JournalNode |
centos76 | hzq | 192.168.1.206 | jdk、hadoop | DataNode、NodeManager、JournalNode |
centos77 | hzq | 192.168.1.207 | jdk、hadoop | DataNode、NodeManager、JournalNode |
centos78 | hzq | 192.168.1.205 | jdk、zookeeper | QuorumPeerMain |
centos79 | hzq | 192.168.1.206 | jdk、zookeeper | QuorumPeerMain |
centos710 | hzq | 192.168.1.207 | jdk、zookeeper | QuorumPeerMain |
②、每臺主機之間設定免密登陸,參考《ssh免密登陸》
③、每檯安裝jdk1.8.0_131,安裝及配置見《Linux安裝JDK步驟》
⑤、修改“etc/hosts"檔案如下:
192.168.31.128centos71
192.168.31.129centos72
192.168.31.130centos73
192.168.31.131centos74
192.168.31.132 centos76
192.168.31.133 centos75
192.168.31.137 centos77
192.168.31.134 centos78
192.168.31.135 centos79
192.168.31.136 centos710
⑥、準備Hadoop安裝包:hadoop-2.8.0.tar.gz
⑦、關閉防火牆
2、Hadoop安裝:
①、在"/home/hzq/software/"下建立"hadoop"資料夾
②、在"hadoop"目錄下建立"data"資料夾,用於存放hadoop執行時檔案
③、將"hadoop-2.8.0.tar.gz"解壓到hadoop目錄下
tar -zxvf ../package/hadoop-2.8.0.tar.gz -C /home/hzq/software/hadoop/
④、刪除"hadoop-2.8.0"下"share"中的doc檔案,為了提高scp拷貝時速度 rm -rf hadoop-2.8.0/share/doc
3、Hadoop配置:
①、修改 hadoop-env.sh 配置檔案,修改JAVA_HOME
export JAVA_HOME=/home/hzq/software/jdk1.8.0_131
②、修改core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hzqnns/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hzq/software/hadoop/data</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>centos78:2181,centos79:2181,centos710:2181</value>
</property>
③、修改hdfs-site.xml <property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.block.size</name>
<value>64M</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hzqnns</value>
</property>
<property>
<name>dfs.ha.namenodes.hzqnns</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hzqnns.nn1</name>
<value>centos71:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.hzqnns.nn1</name>
<value>centos71:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hzqnns.nn2</name>
<value>centos72:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.hzqnns.nn2</name>
<value>centos72:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://centos75:8485;centos76:8485;centos77:8485/hzqnns</value>
</property>
<!-- 指定JournalNode在本地磁碟存放資料的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hzq/software/hadoop/data/journaldata</value>
</property>
<!-- 開啟NameNode失敗自動切換 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失敗自動切換實現方式 -->
<property>
<name>dfs.client.failover.proxy.provider.hzqnns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
<!-- 這裡引入自己的shell指令碼-->
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔離機制時需要ssh免登陸 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hzq/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
④、mapred-site.xml
將“mapred-site.xml.template”進行重新命名。
mv mapred-site.xml.template mapred-site.xml
修改mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
⑤、yarn-site.xml
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>centos73</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>centos74</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>centos78:2181,centos79:2181,centos710:2181</value>
</property>
⑥、配置DataNode主機,修改slaves centos75
centos76
centos77
⑦、將配置好的Hadoop傳送到其他六臺主機上 scp -r hadoop/ centos72:/home/hzq/software/
scp -r hadoop/ centos73:/home/hzq/software/
scp -r hadoop/ centos74:/home/hzq/software/
scp -r hadoop/ centos75:/home/hzq/software/
scp -r hadoop/ centos76:/home/hzq/software/
scp -r hadoop/ centos77:/home/hzq/software/
4、啟動Hadoop(首次啟動必須按照順序來執行)①、檢查Zookeeper叢集是否啟動完成,如果沒有,先啟動Zookeeper叢集。
- 分別在centos78,centos79,centos710啟動zookeeper
zkServer.sh start
- 檢視狀態:一個leader,兩個follower
zkServer.sh status
②、啟動journalnode(分別在centos75、centos76、centos77上執行)
hadoop-daemon.sh start journalnode
注:執行jps命令檢驗是否啟動成功,如成功,分別在centos75、centos76、centos77多一個JournalNode程序③、在centos71上格式化HDFS
hdfs namenode -format
④、使兩個NameNode資料保持一直,將centos71主機上,data中的資料複製到centos72主機data中。 scp -r data/ centos72:/home/hzq/software/hadoop/data
⑤、在centos71上格式化ZKFC
hdfs zkfc -formatZK
⑥、在centos71上啟動HDFS
start-dfs.sh
⑦、在centos73上啟動Resourcemanager及NodeManager
start-yarn.sh
⑧、在centos74上啟動Resourcemanager
yarn-daemon.sh start resourcemanager
5、驗證是否啟動成功:
①、在每臺主機上分別使用jps驗證。
②、HDFS管理介面 http://centos71:50070 或者 http://centos72:50070
③、MR管理介面 http://centos73:8088
或者 http://centos74:8088
6、常用命令:
- 檢視hdfs的各節點狀態資訊
hdfs dfsadmin -report
- 獲取一個namenode節點的HA狀態
hdfs haadmin -getServiceState nn1
- 單獨啟動一個namenode程序
hadoop-daemon.sh start namenode
- 單獨啟動一個zkfc程序
hadoop-daemon.sh start zkfc
- 單獨啟動Resourcemanager程序
yarn-daemon.sh start resourcemanager
7、總結
1、搭建純屬於學習使用,沒有做優化等等。
2、望路過大神多多指點指點。