基於zookeeper的高可用Hadoop HA叢集安裝
1.Hadoop叢集方式介紹
1.1 hadoop1.x和hadoop2.x都支援的namenode+secondarynamenode方式
優點:搭建環境簡單,適合開發者模式下除錯程式
缺點:namenode作為很重要的服務,存在單點故障,如果namenode出問題,會導致整個叢集不可用
1.2.僅hadoop2.x支援的active namenode+standby namenode方式
優點:為解決1.x中namenode單節點故障而生,充分保障Hadoop叢集的高可用
缺點:需要zookeeper最少3臺,需要journalnode最少三臺,目前最多支援2臺namenode,不過節點可以複用,但是不建議
1.3 Hadoop官網關於叢集方式介紹
1)單機Hadoop環境搭建
2)叢集方式
叢集方式一(hadoop1.x和hadoop2.x都支援的namenode+secondarynamenode方式)
叢集方式二(僅hadoop2.x支援的active namenode+standby namenode方式,也叫HADOOP HA方式),這種方式又分為HDFS的HA和YARN的HA單獨分開講解。
生產環境多采用HDFS(zookeeper+journalnode)(active NameNode+standby NameNode+JournalNode+DFSZKFailoverController+DataNode)+YARN(zookeeper)(active ResourceManager+standby ResourceManager+NodeManager)方式
2.基於zookeeper的Hadoop HA叢集安裝
2.1 安裝環境介紹
2.2 安裝前準備工作
1)關閉防火牆
centos7防火牆操作介紹 #centos7啟動firewallsystemctl start firewalld.service
#centos7重啟firewall
systemctl restart firewalld.service
#centos7停止firewall
systemctl stop firewalld.service
#centos7禁止firewall開機啟動
systemctl disable firewalld.service
#centos7檢視防火牆狀態
firewall-cmd --state
#開放防火牆埠
vi /etc/sysconfig/iptables-config
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6379 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6380 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 6381 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16379 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16380 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16381 -j ACCEPT
這裡我關閉防火牆,root下執行如下命令:
systemctl disable firewalld.service
2)優化selinux
作用:Hadoop主節點管理子節點是通過SSH實現的, SELinux不關閉的情況下無法實現,會限制ssh免密碼登入。
編輯/etc/selinux/config,修改前:
# This file controls the state of SELinux on the system.# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
修改後:
# This file controls the state of SELinux on the system.# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
#SELINUX=enforcing
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
#SELINUXTYPE=targeted
執行以下命令使selinux 修改立即生效:
setenforce 03)機器名配置
作用:Hadoop叢集中機器IP可能變化導致叢集間服務中斷,所以在Hadoop中最好以機器名進行配置。
修改各機器上檔案/etc/hostname,配置主機名稱如下:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.185.31 hadoop31
192.168.185.32 hadoop32
192.168.185.33 hadoop33
192.168.185.34 hadoop34
192.168.185.35 hadoop35
而centos7下各個機器的主機名設定檔案為/etc/hostname,以hadoop31節點主機配置為例,配置如下:
#localdomainhadoop31
4)建立hadoop使用者和組
作用:後續單獨以使用者hadoop來管理Hadoop叢集,防止其他使用者誤操作關閉Hadoop 叢集
#以root使用者建立hadoop使用者和組建立hadoop使用者和組groupadd hadoop
useradd -g hadoop hadoop
#修改使用者密碼
passwd hadoop
5)使用者hadoop免祕鑰登入
作用:Hadoop中主節點管理從節點是通過SSH協議登入到從節點實現的,而一般的SSH登入,都是需要輸入密碼驗證的,為了Hadoop主節點方便管理成千上百的從節點,這裡將主節點公鑰拷貝到從節點,實現SSH協議免祕鑰登入,我這裡做的是所有主從節點之間機器免祕鑰登入
#首先切換到上面的hadoop使用者,這裡我是在hadoop31機器上操作ssh hadoop31
su hadoop
#生成非對稱公鑰和私鑰,這個在叢集中所有節點機器都必須執行,一直回車就行
ssh-keygen -t rsa
#通過ssh登入遠端機器時,本機會預設將當前使用者目錄下的.ssh/authorized_keys帶到遠端機器進行驗證,這裡是/home/hadoop/.ssh/authorized_keys中公鑰(來自其他機器上的/home/hadoop/.ssh/id_rsa.pub.pub),以下程式碼只在主節點執行就可以做到主從節點之間SSH免密碼登入
cd /home/hadoop/.ssh/
#首先將Master節點的公鑰新增到authorized_keys
cat id_rsa.pub>>authorized_keys
#其次將Slaves節點的公鑰新增到authorized_keys,這裡我是在Hadoop31機器上操作的
ssh [email protected] cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys
ssh [email protected] cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys
ssh [email protected] cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys
ssh [email protected] cat /home/hadoop/.ssh/id_rsa.pub>> authorized_keys
#必須設定修改/home/hadoop/.ssh/authorized_keys許可權
chmod 600 /home/hadoop/.ssh/authorized_keys
#這裡將Master節點的authorized_keys分發到其他slaves節點
scp -r /home/hadoop/.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/
scp -r /home/hadoop/.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/
scp -r /home/hadoop/.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/
scp -r /home/hadoop/.ssh/authorized_keys [email protected]:/home/hadoop/.ssh/
6)JDK安裝
作用:Hadoop需要java環境支撐,而Hadoop2.7.1最少需要java版本1.7,安裝如下:
#登入到到到hadoop使用者下su hadoop
#下載jdk-7u65-linux-x64.gz放置於/home/hadoop/java並解壓
cd /home/hadoop/java
tar -zxvf jdk-7u65-linux-x64.gz
#編輯vi /home/hadoop/.bashrc,在檔案末尾追加如下內容
export JAVA_HOME=/home/hadoop/java/jdk1.7.0_65
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
#使得/home/hadoop/.bashrc配置生效
source /home/hadoop/.bashrc
很多人是配置linux全域性/etc/profile,這裡不建議這麼做,一旦有人在裡面降級了java環境或者刪除了java環境,就會出問題,建議的是在管理Hadoop叢集的使用者下面修改其.bashrc單獨配置該使用者環境變數
7)zookeeper安裝
#1登入hadoop使用者並下載並解壓zookeeper3.4.6su hadoop
cd /home/hadoop
tar -zxvf zookeeper-3.4.6.tar.gz
#2在叢集中各個節點中配置/etc/hosts,內容如下:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.185.31 hadoop31
192.168.185.32 hadoop32
192.168.185.33 hadoop33
192.168.185.34 hadoop34
192.168.185.35 hadoop35
#3在叢集中各個節點中建立zookeeper資料檔案
ssh hadoop31
cd /home/hadoop
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper
ssh hadoop32
cd /home/hadoop
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper
ssh hadoop33
cd /home/hadoop
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper
ssh hadoop34
cd /home/hadoop
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper
ssh hadoop35
cd /home/hadoop
#zookeeper資料存放位置
mkdir -p /opt/hadoop/zookeeper
#4配置zoo.cfg
ssh hadoop31
cd /home/hadoop/zookeeper-3.4.6/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
#內容如下
initLimit=10
syncLimit=5
dataDir=/opt/hadoop/zookeeper
clientPort=2181 #資料檔案儲存最近的3個快照,預設是都儲存,時間長的話會佔用很大磁碟空間
autopurge.snapRetainCount=3
#單位為小時,每小時清理一次快照資料
autopurge.purgeInterval=1
server.1=hadoop31:2888:3888
server.2=hadoop32:2888:3888
server.3=hadoop33:2888:3888
server.4=hadoop34:2888:3888
server.5=hadoop35:2888:3888
#5在hadoop31上遠端複製分發安裝檔案
scp -r /home/hadoop/zookeeper-3.4.6 [email protected]:/home/hadoop/
scp -r /home/hadoop/zookeeper-3.4.6 [email protected]:/home/hadoop/
scp -r /home/hadoop/zookeeper-3.4.6 [email protected]:/home/hadoop/
scp -r /home/hadoop/zookeeper-3.4.6 [email protected]:/home/hadoop/
#6在叢集中各個節點設定myid必須為數字
ssh hadoop31
echo "1" > /opt/hadoop/zookeeper/myid
ssh hadoop32
echo "2" > /opt/hadoop/zookeeper/myid
ssh hadoop33
echo "3" > /opt/hadoop/zookeeper/myid
#7.各個節點如何啟動zookeeper
ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
#8.各個節點如何關閉zookeeper
ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh stop
#9.各個節點如何檢視zookeeper狀態
ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh status
#10.各個節點如何通過客戶端訪問zookeeper上目錄資料
ssh hadoop31
/home/hadoop/zookeeper-3.4.6/bin/zkCli.sh -server hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181
2.3 Hadoop HA安裝
1)hadoop-2.7.1.tar.gz
#下載hadoop-2.7.1.tar.gz放置於/home/hadoop下並解壓,這裡我在hadoop31操作ssh hadoop31
su hadoop
cd /home/hadoop
tar –zxvf hadoop-2.7.1.tar.gz
2)core-site.xml
修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 開啟垃圾回收站功能,HDFS檔案刪除後先進入垃圾回收站,垃圾回收站最長保留資料時間為1天,超過一天後就刪除 --> <property> <name>fs.trash.interval</name> <value>1440</value> </property> <!-- Hadoop HA部署方式下namenode訪問地址,bigdatacluster-ha是名字可自定義,後面hdfs-site.xml會用到 --> <property> <name>fs.defaultFS</name> <value>hdfs:// bigdatacluster-ha</value> </property> <!--hadoop訪問檔案的IO操作都需要通過程式碼庫。因此,在很多情況下,io.file.buffer.size都被用來設定SequenceFile中用到的讀/寫快取大小。不論是對硬碟或者是網路操作來講,較大的快取都可以提供更高的資料傳輸,但這也就意味著更大的記憶體消耗和延遲。這個引數要設定為系統頁面大小的倍數,以byte為單位,預設值是4KB,一般情況下,可以設定為64KB(65536byte),這裡設定128K--> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <!-- 指定hadoop臨時目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value> </property> <property> <name>ha.zookeeper.session-timeout.ms</name> <value>300000</value> </property> <!-- 指定Hadoop壓縮格式,Apache官網下載的安裝包不支援snappy,需要自己編譯安裝,如何編譯安裝包我在部落格http://aperise.iteye.com/blog/2254487有講解,不適用snappy的話可以不配置 --> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> </configuration>
3)hdfs-site.xml
修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--指定hdfs的nameservice為bigdatacluster-ha,需要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>bigdatacluster-ha</value> </property> <!—指定磁碟預留多少空間,防止磁碟被撐滿用完,單位為bytes --> <property> <name>dfs.datanode.du.reserved</name> <value>107374182400</value> </property> <!-- bigdatacluster-ha下面有兩個NameNode,分別是namenode1,namenode2 --> <property> <name>dfs.ha.namenodes.bigdatacluster-ha</name> <value>namenode1,namenode2</value> </property> <!-- namenode1的RPC通訊地址,這裡埠要和core-site.xml中fs.defaultFS保持一致 --> <property> <name>dfs.namenode.rpc-address.bigdatacluster-ha.namenode1</name> <value>hadoop31:9000</value> </property> <!-- namenode1的http通訊地址 --> <property> <name>dfs.namenode.http-address.bigdatacluster-ha.namenode1</name> <value>hadoop31:50070</value> </property> <!-- namenode2的RPC通訊地址,這裡埠要和core-site.xml中fs.defaultFS保持一致 --> <property> <name>dfs.namenode.rpc-address.bigdatacluster-ha.namenode2</name> <value>hadoop32:9000</value> </property> <!-- namenode2的http通訊地址 --> <property> <name>dfs.namenode.http-address.bigdatacluster-ha.namenode2</name> <value>hadoop32:50070</value> </property> <!-- 指定NameNode的元資料在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop31:8485;hadoop32:8485;hadoop33:8485;hadoop34:8485;hadoop35:8485/bigdatacluster-ha</value> </property> <!-- 配置失敗自動切換實現方式 --> <property> <name>dfs.client.failover.proxy.provider.bigdatacluster-ha</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔離機制,主要使用者遠端管理監聽其他機器相關服務 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- 使用隔離機制時需要ssh免密碼登陸 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <!-- 指定NameNode的元資料在JournalNode上的存放位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/hadoop/journal</value> </property> <!--指定支援高可用自動切換機制--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!--指定namenode名稱空間的儲存地址--> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop/hdfs/name</value> </property> <!--指定datanode資料儲存地址--> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop/hdfs/data</value> </property> <!--指定資料冗餘份數--> <property> <name>dfs.replication</name> <value>3</value> </property> <!--指定可以通過web訪問hdfs目錄--> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>600</value> <description>The number of server threads for the namenode.</description> </property> <property> <name>dfs.datanode.handler.count</name> <value>600</value> <description>The number of server threads for the datanode.</description> </property> <property> <name>dfs.client.socket-timeout</name> <value>600000</value> </property> <property> <!--這裡設定Hadoop允許開啟最大檔案數,預設4096,不設定的話會提示xcievers exceeded錯誤--> <name>dfs.datanode.max.transfer.threads</name> <value>409600</value> </property> </configuration>
4)mapred-site.xml
修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 配置MapReduce運行於yarn中 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.maps</name>
<value>12</value>
</property>
<property>
<name>mapreduce.job.reduces</name>
<value>12</value>
</property>
<!-- 指定Hadoop壓縮格式,Apache官網下載的安裝包不支援snappy,需要自己編譯安裝,如何編譯安裝包我在部落格http://aperise.iteye.com/blog/2254487有講解,不適用snappy的話可以不配置 -->
<property>
<name>mapreduce.output.fileoutputformat.compress</name>
<value>true</value>
<description>Should the job outputs be compressed?
</description>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.type</name>
<value>RECORD</value>
<description>If the job outputs are to compressed as SequenceFiles, how should
they be compressed? Should be one of NONE, RECORD or BLOCK.
</description>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
<description>If the job outputs are compressed, how should they be compressed?
</description>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
<description>Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
</description>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
<description>If the map outputs are compressed, how should they be
compressed?
</description>
</property>
</configuration>
5)yarn-site.xml
修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!--日誌聚合功能yarn.log start------------------------------------------------------------------------>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--在HDFS上聚合的日誌最長保留多少秒。3天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>259200</value>
</property>
<!--日誌聚合功能yarn.log end-------------------------------------------------------------------------->
<!--resourcemanager失聯後重新連結的時間-->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<!--配置resourcemanager start------------------------------------------------------------------------->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>besttonecluster-yarn</value>
</property>
<!--開啟resourcemanager HA,預設為false-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop31</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop32</value>
</property>
<!--配置rm1-->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop31:8088</value>
</property>
<!--配置rm2-->
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop32:8088</value>
</property>
<!--開啟故障自動切換-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
</property>
<!--開啟自動恢復功能-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!--配置resourcemanager end--------------------------------------------------------------------------->
<!--配置nodemanager start----------------------------------------------------------------------------->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!--配置nodemanager end------------------------------------------------------------------------------->
</configuration>
6)slaves
修改配置檔案/home/hadoop/hadoop-2.7.1/etc/hadoop/slaves
Hadoop31Hadoop32
Hadoop33
Hadoop34
Hadoop35
7)hadoop-env.sh和yarn-env.sh
在/home/hadoop/hadoop-2.7.1/etc/hadoop/hadoop-env.sh和/home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-env.sh中配置JAVA_HOME
export JAVA_HOME=/home/hadoop/java/jdk1.7.0_658)bashrc
當前使用者hadoop生效,在使用者目錄下/home/hadoop/.bashrc增加如下配置
export HADOOP_HOME=/home/hadoop/hadoop2.7.1export PATH=${HADOOP_HOME}/bin:${PATH}
9)分發安裝檔案到其他機器
#這裡我是在hadoop31上操作scp -r /home/hadoop/hadoop-2.7.1 [email protected]:/home/hadoop/
scp -r /home/hadoop/hadoop-2.7.1 [email protected] hadoop33:/home/hadoop/
scp -r /home/hadoop/hadoop-2.7.1 [email protected] hadoop34:/home/hadoop/
scp -r /home/hadoop/hadoop-2.7.1 [email protected] hadoop35:/home/hadoop/
2.4 Hadoop HA初次啟動
1)啟動zookeeper
ssh hadoop31/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop32
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop33
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop34
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop35
/home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start
#jps檢視是否有QuorumPeerMain 程序
#/home/hadoop/zookeeper-3.4.6/ bin/zkServer.sh status檢視zookeeper狀態
#/home/hadoop/zookeeper-3.4.6/ bin/zkServer.sh stop關閉zookeeper
2)格式化zookeeper上hadoop-ha目錄
/home/hadoop/hadoop-2.7.1/bin/hdfs zkfc –formatZK#可以通過如下方法檢查zookeeper上是否已經有Hadoop HA目錄
# /home/hadoop/zookeeper-3.4.6/bin/zkCli.sh -server hadoop31:2181,hadoop32:2181,hadoop33:2181,hadoop34:2181,hadoop35:2181
#ls /
3)啟動namenode日誌同步服務journalnode
ssh hadoop31/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode
ssh hadoop32
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode
ssh hadoop33
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode
ssh hadoop34
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode
ssh hadoop35
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start journalnode
4)格式化namenode
#這步操作只能在namenode服務節點hadoop31或者hadoop32執行中一臺上執行ssh hadoop31
/home/hadoop/hadoop-2.7.1/bin/hdfs namenode -format
5)啟動namenode、同步備用namenode、啟動備用namenode
#啟動namenodessh hadoop31
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start namenode
#同步備用namenode、啟動備用namenode
ssh hadoop32
/home/hadoop/hadoop-2.7.1/bin/hdfs namenode -bootstrapStandby
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start namenode
6)啟動DFSZKFailoverController
ssh hadoop31/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start zkfc
ssh hadoop32
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemon.sh start zkfc
7)啟動datanode
#注意hadoop-daemons.sh datanode是啟動所有datanode,而hadoop-daemon.sh datanode是啟動單個datanodessh hadoop31
/home/hadoop/hadoop-2.7.1/sbin/hadoop-daemons.sh start datanode
8)啟動yarn
#在hadoop31上啟動resouremanager,在hadoop31,hadoop32,hadoop33,hadoop34,hadoop35上啟動nodemanagerssh hadoop31
/home/hadoop/hadoop-2.7.1/sbin/start-yarn.sh
#在hadoop31上啟動備用resouremanager
ssh hadoop32
/home/hadoop/hadoop-2.7.1/sbin/yarn-daemon.sh start resourcemanager
至此,Hadoop 基於zookeeper的高可用叢集就安裝成功,並且啟動了。