1. 程式人生 > >基於zookeeper的高可用Hadoop HA叢集安裝

基於zookeeper的高可用Hadoop HA叢集安裝

1.Hadoop叢集方式介紹

    1.1 hadoop1.x和hadoop2.x都支援的namenode+secondarynamenode方式


         優點:搭建環境簡單,適合開發者模式下除錯程式

         缺點:namenode作為很重要的服務,存在單點故障,如果namenode出問題,會導致整個叢集不可用

    1.2.僅hadoop2.x支援的active namenode+standby namenode方式



       優點:為解決1.x中namenode單節點故障而生,充分保障Hadoop叢集的高可用

       缺點:需要zookeeper最少3臺,需要journalnode最少三臺,目前最多支援2臺namenode,不過節點可以複用,但是不建議

    1.3 Hadoop官網關於叢集方式介紹

        1)單機Hadoop環境搭建

        2)叢集方式

            叢集方式一(hadoop1.x和hadoop2.x都支援的namenode+secondarynamenode方式)

            叢集方式二(僅hadoop2.x支援的active namenode+standby namenode方式,也叫HADOOP HA方式),這種方式又分為HDFS的HA和YARN的HA單獨分開講解

        生產環境多采用HDFS(zookeeper+journalnode)(active NameNode+standby NameNode+JournalNode+DFSZKFailoverController+DataNode)+YARN(zookeeper)(active ResourceManager+standby ResourceManager+NodeManager)方式

,這裡我講解的是僅hadoop2.x支援基於zookeeper的Hadoop HA叢集方式,這種方式主要適用於生產環境

2.基於zookeeper的Hadoop HA叢集安裝

    2.1 安裝環境介紹


 

    2.2 安裝前準備工作

        1)關閉防火牆

centos7防火牆操作介紹  #centos7啟動firewall
systemctl start firewalld.service
#centos7重啟firewall
systemctl restart firewalld.service
#centos7停止firewall
systemctl stop firewalld.service 
#centos7禁止firewall開機啟動
systemctl disable firewalld.service 
#centos7檢視防火牆狀態
firewall-cmd --state
#開放防火牆埠
vi /etc/sysconfig/iptables-config
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6379 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6380 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6381 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16379 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16380 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16381 -j ACCEPT

         這裡我關閉防火牆,root下執行如下命令:

systemctl stop firewalld.service 
systemctl disable firewalld.service

        2)優化selinux

        作用:Hadoop主節點管理子節點是通過SSH實現的, SELinux不關閉的情況下無法實現,會限制ssh免密碼登入。

        編輯/etc/selinux/config,修改前:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected. 
# mls - Multi Level Security protection.
SELINUXTYPE=targeted

         修改後:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
#SELINUX=enforcing
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected. 
# mls - Multi Level Security protection.
#SELINUXTYPE=targeted

         執行以下命令使selinux 修改立即生效:

setenforce 0

    3)機器名配置

        作用:Hadoop叢集中機器IP可能變化導致叢集間服務中斷,所以在Hadoop中最好以機器名進行配置。

        修改各機器上檔案/etc/hostname,配置主機名稱如下:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.185.31 hadoop31
192.168.185.32 hadoop32
192.168.185.33 hadoop33
192.168.185.34 hadoop34
192.168.185.35 hadoop35

         而centos7下各個機器的主機名設定檔案為/etc/hostname,以hadoop31節點主機配置為例,配置如下:

#localdomain
hadoop31

    4)建立hadoop使用者和組

        作用:後續單獨以使用者hadoop來管理Hadoop叢集,防止其他使用者誤操作關閉Hadoop 叢集

#以root使用者建立hadoop使用者和組建立hadoop使用者和組 
groupadd hadoop 
useradd -g hadoop hadoop 
#修改使用者密碼
passwd hadoop

    5)使用者hadoop免祕鑰登入

        作用:Hadoop中主節點管理從節點是通過SSH協議登入到從節點實現的,而一般的SSH登入,都是需要輸入密碼驗證的,為了Hadoop主節點方便管理成千上百的從節點,這裡將主節點公鑰拷貝到從節點,實現SSH協議免祕鑰登入,我這裡做的是所有主從節點之間機器免祕鑰登入

#首先切換到上面的hadoop使用者,這裡我是在hadoop31機器上操作 
ssh hadoop31
su hadoop 
#生成非對稱公鑰和私鑰,這個在叢集中所有節點機器都必須執行,一直回車就行 
ssh-keygen -t rsa 
#通過ssh登入遠端機器時,本機會預設將當前使用者目錄下的.ssh/authorized_keys帶到遠端機器進行驗證,這裡是/home/hadoop/.ssh/authorized_keys中公鑰(來自其他機器上的/home/hadoop/.ssh/id_rsa.pub.pub),以下程式碼只在主節點執行就可以做到主從節點之間SSH免密碼登入 
cd /home/hadoop/.ssh/ 
#首先將Master節點的公鑰新增到authorized_keys 
cat id_rsa.pub>>authorized_keys 
#其次將Slaves節點的公鑰新增到authorized_keys,這裡我是在Hadoop31機器上操作的 
ssh [email protected] cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 
ssh [email protected] cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 
ssh [email protected] cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 
ssh [email protected] cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys 
#必須設定修改/home/hadoop/.ssh/authorized_keys許可權 
chmod 600 /home/hadoop/.ssh/authorized_keys 
#這裡將Master節點的authorized_keys分發到其他slaves節點 
scp -r /home/hadoop/.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/ 
scp -r /home/hadoop/.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/ 
scp -r /home/hadoop/.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/ 
scp -r /home/hadoop/.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/

    6)JDK安裝

        作用:Hadoop需要java環境支撐,而Hadoop2.7.1最少需要java版本1.7,安裝如下:

#登入到到到hadoop使用者下
su hadoop
#下載jdk-7u65-linux-x64.gz放置於/home/hadoop/java並解壓
cd /home/hadoop/java
tar -zxvf jdk-7u65-linux-x64.gz
#編輯vi /home/hadoop/.bashrc,在檔案末尾追加如下內容
export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65 
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 
export PATH=$PATH:$JAVA_HOME/bin 
#使得/home/hadoop/.bashrc配置生效
source /home/hadoop/.bashrc

         很多人是配置linux全域性/etc/profile這裡不建議這麼做,一旦有人在裡面降級了java環境或者刪除了java環境,就會出問題,建議的是在管理Hadoop叢集的使用者下面修改其.bashrc單獨配置該使用者環境變數

    7)zookeeper安裝

#1登入hadoop使用者並下載並解壓zookeeper3.4.6
su hadoop
cd /home/hadoop 
tar -zxvf zookeeper-3.4.6.tar.gz 

#2在叢集中各個節點中配置/etc/hosts,內容如下:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.185.31 hadoop31 
192.168.185.32 hadoop32 
192.168.185.33 hadoop33 
192.168.185.34 hadoop34 
192.168.185.35 hadoop35

#3在叢集中各個節點中建立zookeeper資料檔案
ssh hadoop31
cd /home/hadoop 
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper 
ssh hadoop32
cd /home/hadoop 
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper 
ssh hadoop33
cd /home/hadoop 
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper 
ssh hadoop34
cd /home/hadoop 
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper 
ssh hadoop35
cd /home/hadoop 
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper 

#4配置zoo.cfg
ssh hadoop31
cd /home/hadoop/zookeeper-3.4.6/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
#內容如下
initLimit=10 
syncLimit=5 
dataDir=/opt/hadoop/zookeeper 
clientPort=2181  #資料檔案儲存最近的3個快照,預設是都儲存,時間長的話會佔用很大磁碟空間
autopurge.snapRetainCount=3
#單位為小時,每小時清理一次快照資料
autopurge.purgeInterval=1
server.1=hadoop31:2888:3888 
server.2=hadoop32:2888:3888 
server.3=hadoop33:2888:3888
server.4=hadoop34:2888:3888 
server.5=hadoop35:2888:3888 
#5在hadoop31上遠端複製分發安裝檔案
scp -r /home/hadoop/zookeeper-3.4.6 [email protected]:/home/hadoop/ 
scp -r /home/hadoop/zookeeper-3.4.6 [email protected]:/home/hadoop/ 
scp -r /home/hadoop/zookeeper-3.4.6 [email protected]:/home/hadoop/ 
scp -r /home/hadoop/zookeeper-3.4.6 [email protected]:/home/hadoop/ 

#6在叢集中各個節點設定myid必須為數字 
ssh hadoop31 
echo "1" > /opt/hadoop/zookeeper/myid 
ssh hadoop32 
echo "2" > /opt/hadoop/zookeeper/myid 
ssh hadoop33 
echo "3" > /opt/hadoop/zookeeper/myid 

#7.各個節點如何啟動zookeeper
ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

#8.各個節點如何關閉zookeeper
ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh stop 

#9.各個節點如何檢視zookeeper狀態
ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh status 

#10.各個節點如何通過客戶端訪問zookeeper上目錄資料
ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkCli.sh -server hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181

    2.3 Hadoop HA安裝

        1)hadoop-2.7.1.tar.gz

#下載hadoop-2.7.1.tar.gz放置於/home/hadoop下並解壓,這裡我在hadoop31操作
ssh hadoop31
su hadoop
cd /home/hadoop
tar –zxvf hadoop-2.7.1.tar.gz

        2)core-site.xml

        修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>  
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
<configuration>  
	<!-- 開啟垃圾回收站功能,HDFS檔案刪除後先進入垃圾回收站,垃圾回收站最長保留資料時間為1天,超過一天後就刪除 --> 
	<property>
		<name>fs.trash.interval</name>
		<value>1440</value>
	</property>
	<!-- Hadoop HA部署方式下namenode訪問地址,bigdatacluster-ha是名字可自定義,後面hdfs-site.xml會用到 --> 
	<property>
		<name>fs.defaultFS</name>  
		<value>hdfs:// bigdatacluster-ha</value>
	</property>
	<!--hadoop訪問檔案的IO操作都需要通過程式碼庫。因此,在很多情況下,io.file.buffer.size都被用來設定SequenceFile中用到的讀/寫快取大小。不論是對硬碟或者是網路操作來講,較大的快取都可以提供更高的資料傳輸,但這也就意味著更大的記憶體消耗和延遲。這個引數要設定為系統頁面大小的倍數,以byte為單位,預設值是4KB,一般情況下,可以設定為64KB(65536byte),這裡設定128K-->  
	<property>  
		<name>io.file.buffer.size</name>  
		<value>131072</value>  
	</property> 
	<!-- 指定hadoop臨時目錄 --> 
	<property> 
		<name>hadoop.tmp.dir</name> 
		<value>/opt/hadoop/tmp</value> 
	</property> 
	<!-- 指定zookeeper地址 --> 
	<property> 
		<name>ha.zookeeper.quorum</name> 
		<value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value> 
	</property> 
	<property> 
		<name>ha.zookeeper.session-timeout.ms</name> 
		<value>300000</value> 
	</property>
	<!-- 指定Hadoop壓縮格式,Apache官網下載的安裝包不支援snappy,需要自己編譯安裝,如何編譯安裝包我在部落格http://aperise.iteye.com/blog/2254487有講解,不適用snappy的話可以不配置 --> 
	<property>  
		<name>io.compression.codecs</name>  
		<value>org.apache.hadoop.io.compress.SnappyCodec</value>  
	</property>  
</configuration>

        3)hdfs-site.xml

        修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration> 
	<!--指定hdfs的nameservice為bigdatacluster-ha,需要和core-site.xml中的保持一致 --> 
	<property> 
		<name>dfs.nameservices</name> 
		<value>bigdatacluster-ha</value> 
	</property> 
	<!—指定磁碟預留多少空間,防止磁碟被撐滿用完,單位為bytes --> 
	<property>
		<name>dfs.datanode.du.reserved</name>
		<value>107374182400</value>
	</property>
	<!-- bigdatacluster-ha下面有兩個NameNode,分別是namenode1,namenode2 --> 
	<property> 
		<name>dfs.ha.namenodes.bigdatacluster-ha</name> 
		<value>namenode1,namenode2</value> 
	</property> 
	<!-- namenode1的RPC通訊地址,這裡埠要和core-site.xml中fs.defaultFS保持一致 --> 
	<property> 
		<name>dfs.namenode.rpc-address.bigdatacluster-ha.namenode1</name> 
		<value>hadoop31:9000</value> 
	</property> 
	<!-- namenode1的http通訊地址 --> 
	<property> 
		<name>dfs.namenode.http-address.bigdatacluster-ha.namenode1</name> 
		<value>hadoop31:50070</value> 
	</property> 
	<!-- namenode2的RPC通訊地址,這裡埠要和core-site.xml中fs.defaultFS保持一致 --> 
	<property> 
		<name>dfs.namenode.rpc-address.bigdatacluster-ha.namenode2</name> 
		<value>hadoop32:9000</value> 
	</property> 
	<!-- namenode2的http通訊地址 --> 
	<property> 
		<name>dfs.namenode.http-address.bigdatacluster-ha.namenode2</name> 
		<value>hadoop32:50070</value> 
	</property> 

	<!-- 指定NameNode的元資料在JournalNode上的存放位置 --> 
	<property> 
		<name>dfs.namenode.shared.edits.dir</name> 
		<value>qjournal://hadoop31:8485;hadoop32:8485;hadoop33:8485;hadoop34:8485;hadoop35:8485/bigdatacluster-ha</value> 
	</property> 

	<!-- 配置失敗自動切換實現方式 --> 
	<property> 
		<name>dfs.client.failover.proxy.provider.bigdatacluster-ha</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property> 

	<!-- 配置隔離機制,主要使用者遠端管理監聽其他機器相關服務 --> 
	<property> 
		<name>dfs.ha.fencing.methods</name> 
		<value>sshfence</value> 
	</property> 
	<!-- 使用隔離機制時需要ssh免密碼登陸 --> 
	<property> 
		<name>dfs.ha.fencing.ssh.private-key-files</name> 
		<value>/home/hadoop/.ssh/id_rsa</value> 
	</property> 

	<!-- 指定NameNode的元資料在JournalNode上的存放位置 --> 
	<property> 
		<name>dfs.journalnode.edits.dir</name> 
		<value>/opt/hadoop/journal</value> 
	</property> 

	<!--指定支援高可用自動切換機制--> 
	<property> 
		<name>dfs.ha.automatic-failover.enabled</name> 
		<value>true</value> 
	</property> 

	<!--指定namenode名稱空間的儲存地址--> 
	<property> 
		<name>dfs.namenode.name.dir</name>    
		<value>file:/opt/hadoop/hdfs/name</value> 
	</property> 

	<!--指定datanode資料儲存地址--> 
	<property> 
		<name>dfs.datanode.data.dir</name> 
		<value>file:/opt/hadoop/hdfs/data</value> 
	</property> 

	<!--指定資料冗餘份數--> 
	<property> 
		<name>dfs.replication</name> 
		<value>3</value> 
	</property> 

	<!--指定可以通過web訪問hdfs目錄--> 
	<property> 
		<name>dfs.webhdfs.enabled</name> 
		<value>true</value> 
	</property> 

	<property> 
		<name>ha.zookeeper.quorum</name> 
		<value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value> 
	</property> 

	
	<property>
		<name>dfs.namenode.handler.count</name>
		<value>600</value>
		<description>The number of server threads for the namenode.</description>
	</property>
	<property>
		<name>dfs.datanode.handler.count</name>
		<value>600</value>
		<description>The number of server threads for the datanode.</description>
	</property>
	<property>
		<name>dfs.client.socket-timeout</name>
		<value>600000</value>
	</property>
	<property>  
		<!--這裡設定Hadoop允許開啟最大檔案數,預設4096,不設定的話會提示xcievers exceeded錯誤-->  
		<name>dfs.datanode.max.transfer.threads</name>  
		<value>409600</value>  
	</property>   
</configuration>

        4)mapred-site.xml

        修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.xml

<?xml version="1.0"?>   
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>   
<configuration>   
    <!-- 配置MapReduce運行於yarn中 -->   
    <property>   
        <name>mapreduce.framework.name</name>   
        <value>yarn</value>   
    </property>    
    <property>  
        <name>mapreduce.job.maps</name>  
        <value>12</value>  
    </property>  
    <property>  
        <name>mapreduce.job.reduces</name>  
        <value>12</value>  
    </property>  
  
    <!-- 指定Hadoop壓縮格式,Apache官網下載的安裝包不支援snappy,需要自己編譯安裝,如何編譯安裝包我在部落格http://aperise.iteye.com/blog/2254487有講解,不適用snappy的話可以不配置 -->   
    <property>  
        <name>mapreduce.output.fileoutputformat.compress</name>  
        <value>true</value>  
        <description>Should the job outputs be compressed?  
        </description>  
    </property>  
    <property>  
        <name>mapreduce.output.fileoutputformat.compress.type</name>  
        <value>RECORD</value>  
        <description>If the job outputs are to compressed as SequenceFiles, how should  
               they be compressed? Should be one of NONE, RECORD or BLOCK.  
        </description>  
    </property>  
    <property>  
        <name>mapreduce.output.fileoutputformat.compress.codec</name>  
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>  
        <description>If the job outputs are compressed, how should they be compressed?  
        </description>  
    </property>  
    <property>  
        <name>mapreduce.map.output.compress</name>  
        <value>true</value>  
        <description>Should the outputs of the maps be compressed before being  
               sent across the network. Uses SequenceFile compression.  
        </description>  
    </property>  
    <property>  
        <name>mapreduce.map.output.compress.codec</name>  
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>  
        <description>If the map outputs are compressed, how should they be   
               compressed?  
        </description>  
    </property>    
</configuration> 

        5)yarn-site.xml

        修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-site.xml

<?xml version="1.0"?> 
<configuration> 
	<!--日誌聚合功能yarn.log start------------------------------------------------------------------------>  
	<property> 
		<name>yarn.log-aggregation-enable</name> 
		<value>true</value> 
	</property> 
	<!--在HDFS上聚合的日誌最長保留多少秒。3天-->  
	<property> 
		<name>yarn.log-aggregation.retain-seconds</name> 
		<value>259200</value> 
	</property> 
	<!--日誌聚合功能yarn.log end-------------------------------------------------------------------------->  

	<!--resourcemanager失聯後重新連結的時間-->  
	<property>  
		<name>yarn.resourcemanager.connect.retry-interval.ms</name>
		<value>2000</value>  
	</property> 

	<!--配置resourcemanager start------------------------------------------------------------------------->
	<property> 
		<name>yarn.resourcemanager.zk-address</name> 
		<value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value>  
	</property> 
	<property>  
		<name>yarn.resourcemanager.cluster-id</name>  
		<value>besttonecluster-yarn</value>  
	</property>  
	<!--開啟resourcemanager HA,預設為false-->  
	<property>  
		<name>yarn.resourcemanager.ha.enabled</name>  
		<value>true</value>  
	</property>  
	<property> 
		<name>yarn.resourcemanager.ha.rm-ids</name> 
		<value>rm1,rm2</value> 
	</property> 
	<property> 
		<name>yarn.resourcemanager.hostname.rm1</name> 
		<value>hadoop31</value> 
	</property>     
	<property> 
		<name>yarn.resourcemanager.hostname.rm2</name> 
		<value>hadoop32</value> 
	</property> 
	<!--配置rm1--> 
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>hadoop31:8088</value>
	</property>
	<!--配置rm2-->  
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>hadoop32:8088</value>
	</property>
	<!--開啟故障自動切換-->  
	<property>
		<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
		<value>/yarn-leader-election</value>
	</property>

	<!--開啟自動恢復功能-->  
	<property> 
		<name>yarn.resourcemanager.recovery.enabled</name>  
		<value>true</value>  
	</property> 
	<property>
		<name>yarn.resourcemanager.store.class</name>
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
	</property>
	<!--配置resourcemanager end--------------------------------------------------------------------------->

	<!--配置nodemanager start----------------------------------------------------------------------------->
	<property>  
		<name>yarn.nodemanager.aux-services</name>  
		<value>mapreduce_shuffle</value>  
	</property>  
	<property>  
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>  
	</property>  
	<!--配置nodemanager end------------------------------------------------------------------------------->
</configuration>

        6)slaves

        修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/slaves

Hadoop31
Hadoop32
Hadoop33
Hadoop34
Hadoop35

        7)hadoop-env.sh和yarn-env.sh

        在/home/hadoop/hadoop-2.7.1/etc/hadoop/hadoop-env.sh和/home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-env.sh中配置JAVA_HOME

export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65

        8)bashrc

        當前使用者hadoop生效,在使用者目錄下/home/hadoop/.bashrc增加如下配置

export HADOOP_HOME=/home/hadoop/hadoop2.7.1
export PATH=${HADOOP_HOME}/bin:${PATH}

        9)分發安裝檔案到其他機器

#這裡我是在hadoop31上操作
scp -r /home/hadoop/hadoop-2.7.1 [email protected]:/home/hadoop/
scp -r /home/hadoop/hadoop-2.7.1 [email protected] hadoop33:/home/hadoop/
scp -r /home/hadoop/hadoop-2.7.1 [email protected] hadoop34:/home/hadoop/ 
scp -r /home/hadoop/hadoop-2.7.1 [email protected] hadoop35:/home/hadoop/

    2.4 Hadoop HA初次啟動

        1)啟動zookeeper

ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop32
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop33
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop34
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop35
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

         #jps檢視是否有QuorumPeerMain 程序

        #/home/hadoop/zookeeper-3.4.6/ bin/zkServer.sh status檢視zookeeper狀態

        #/home/hadoop/zookeeper-3.4.6/ bin/zkServer.sh stop關閉zookeeper

        2)格式化zookeeper上hadoop-ha目錄

/home/hadoop/hadoop-2.7.1/bin/hdfs zkfc –formatZK
#可以通過如下方法檢查zookeeper上是否已經有Hadoop HA目錄
# /home/hadoop/zookeeper-3.4.6/bin/zkCli.sh -server hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181 
#ls /

        3)啟動namenode日誌同步服務journalnode

ssh hadoop31
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode 
ssh hadoop32
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode 
ssh hadoop33
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode 
ssh hadoop34
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode 
ssh hadoop35
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode

        4)格式化namenode

#這步操作只能在namenode服務節點hadoop31或者hadoop32執行中一臺上執行
ssh hadoop31
/home/hadoop/hadoop-2.7.1/bin/hdfs namenode -format

        5)啟動namenode、同步備用namenode、啟動備用namenode

#啟動namenode
ssh hadoop31
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start namenode 
#同步備用namenode、啟動備用namenode
ssh hadoop32
/home/hadoop/hadoop-2.7.1/bin/hdfs namenode -bootstrapStandby 
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start namenode

        6)啟動DFSZKFailoverController

ssh hadoop31
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start zkfc 
ssh hadoop32
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start zkfc

        7)啟動datanode

#注意hadoop-daemons.sh datanode是啟動所有datanode,而hadoop-daemon.sh datanode是啟動單個datanode
ssh hadoop31
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemons.sh start datanode

        8)啟動yarn

#在hadoop31上啟動resouremanager,在hadoop31,hadoop32,hadoop33,hadoop34,hadoop35上啟動nodemanager
ssh hadoop31
/home/hadoop/hadoop-2.7.1/sbin/start-yarn.sh 
#在hadoop31上啟動備用resouremanager
ssh hadoop32
/home/hadoop/hadoop-2.7.1/sbin/yarn-daemon.sh start resourcemanager

         至此,Hadoop 基於zookeeper的高可用叢集就安裝成功,並且啟動了。