1. 程式人生 > >hadoop hdfs 高可用性安裝 測試 zookeeper 自動故障轉移

hadoop hdfs 高可用性安裝 測試 zookeeper 自動故障轉移

安裝
基於CentOS 7 安裝,系統非最小化安裝,選擇部分Server 服務,開發工具組。全程使用root使用者,因為作業系統的許可權、安全,在啟動時會和使用其它使用者有差別。
Step 1:下載hadoop.apache.org
選擇推薦的下載映象結點;
https://hadoop.apache.org/releases.html

Step 2:下載JDK
http://www.oracle.com/technetwork/pt/java/javase/downloads/jdk8-downloads-2133151.html

Step 4: 解壓下載好的檔案
解壓:JDK檔案
命令 ## Tar –zxvf /root/Download/jdk-8u192-linux-x64.tar.gz -C /opt
解壓:Hadoop檔案
命令 ## Tar –zxvf /root/Download/ hadoop-2.9.2.tar.gz –C /opt

Step 5 安裝JSVC
命令 ## rpm –ivh apache-commons-daemon-jsvc-1.0.13-7.el7.x86_64.rpm

* Step 6:修改主機名
命令 ## vi /etc/hosts
新增所有涉及的伺服器別名
192.168.209.131 jacksun01.com
192.168.209.132 jacksun02.com
192.168.209.133 jacksun03.com
新增主機的名稱
命令 ## vi /etc/hostname
jacksun01.com

* Step 7: ssh互信(免密碼登入)
注意我這裡配置的是root使用者,所以以下的家目錄是/root

如果你配置的是使用者是xxxx,那麼家目錄應該是/home/xxxxx/

複製程式碼
#在主節點執行下面的命令:
# ssh-keygen -t rsa -P '' #一路回車直到生成公鑰
命令 #ssh-keygen -t rsa;cd /root/.ssh;ssh-copy-id jacksun01.com;ssh-copy-id jacksun02.com;ssh-copy-id jacksun03.com

 

Step 8: 新增環境變數
命令 #: vi /root/.bash_profile

PATH=/usr/local/webserver/mysql/bin:/usr/python/bin:/opt/hadoop-2.9.2/etc/hadoop:/opt/jdk/bin:/opt/hadoop-2.9.2/bin:/opt/hadoop-2.9.2/sbin:$PATH:$HOME/bin:/opt/spark/bin:/opt/spark/sbin:/opt/hive/bin:/opt/flume/bin:/opt/kafka/bin
export PATH
JAVA_HOME=/opt/jdk
export JAVA_HOME
export HADOOP_HOME=/opt/hadoop-2.9.2
export LD_LIBRARY_PATH=/usr/local/lib:/usr/python/lib:/usr/local/webserver/mysql/lib

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin
export HIVE_HOME=/opt/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin
export YARN_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export SQOOP_HOME=/opt/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export FLUME_HOM=/opt/flume

Step9: 修改 vi /opt/hadoop-2.9.2/etc/hadoop/hadoop-env.sh
新增

JAVA_HOME=/opt/jdk-10.0.2
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export JSVC_HOME=/usr/bin

Step 10: 修改 vi /opt/hadoop-2.9.2/etc/hadoop/core-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定hadoop預設的Name node 地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!-- 指定hadoop執行時產生journalnode檔案的儲存路徑 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop-2.9.2/jndata</value>
</property>
<!-- 指定hadoop執行時產生檔案的儲存路徑 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.9.2/tmp</value>
</property>
</configuration>

Step 11: 修改 vi /opt/hadoop-2.9.2/etc/hadoop/hdfs-site.xml

<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 設定名稱空間 -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- 設定namenode serviceID節點 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- 設定namenode RPC訪問介面 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>jacksun01.com:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>jacksun02.com:8020</value>
</property>
<!-- 設定namenode HTTP訪問介面 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>jacksun01.com:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>jacksun02.com:50070</value>
</property>
<!-- 設定日誌共享節點JNs 伺服器 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://jacksun01.com:8485;jacksun02.com:8485;jacksun03:8485/mycluster</value>
</property>
<!-- Java類 HDFS clients use to contact the Active NameNode -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 設定隔離方式 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 設定私鑰存放的路徑 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>

<!-- 設定namenode存放的路徑 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-2.9.2/name</value>
</property>
<!-- 設定datanode存放的路徑 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-2.9.2/data</value>
</property>
</configuration>

Step 12:修改vi /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml
如果此檔案不存在,複製 template檔案
# cp /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml.template /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml

<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-2.9.2</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-2.9.2</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>jacksun01.com:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>jacksun01.com:19888</value>
</property>
</configuration>

Step 13:修改 vi /opt/hadoop-2.9.2/etc/hadoop/yarn-site.xml
<configuration>
<!-- reducer取資料的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 開啟日誌聚合 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日誌聚合目錄 -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/opt/hadoop-2.9.2/logs</value>
</property>
<property>
<!-- 指定ResourceManager 所在的節點 -->
<name>yarn.resourcemanager.hostname</name>
<value>jacksun01.com</value>
</property>
<property>
<!-- 指定yarn.log.server.url所在的節點 -->
<name>yarn.log.server.url</name>
<value>http://jacksun01.com:19888/jobhistory/logs</value>
</property>
</configuration>


Step 14:修改 vi /opt/hadoop-2.9.2/etc/hadoopslaves #設定datanode結點
jacksun01.com
jacksun02.com
jacksun03.com
##使用一臺VM按cluster的方式搭建,屬於分散式。當使用多臺機器時,同樣的配置方式,並將多臺機器互信,則為真正的分散式。

Step 15:修改 vi /opt/hadoop-2.9.2/etc/hadoop/yarn-env.sh
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root

Step 16:複製伺服器
關閉Linux系統: halt 或者 init 0 (reboot 或者 init 6)
A)複製VM檔案 建立Server:jacksun02.com;jacksun03.com;
B)修改## vi /etc/hostname
C)ssh互信(免密碼登入)
D)
Step 17: start the JournalNode daemons on the set of machines
# cd /opt/hadoop-2.9.2; ./sbin/hadoop-daemon.sh start journalnode
Step 18:格式化hadoop
# cd /opt/hadoop-2.9.2/etc/hadoop/
# hdfs namenode -format
格式化一次就好,多次格式化可能導致datanode無法識別,如果想要多次格式化,需要先刪除資料再格式化
A)If you are setting up a fresh HDFS cluster, you should first run the format command (hdfs namenode -format) on one of NameNodes.

B)If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode by running the command “hdfs namenode -bootstrapStandby” on the unformatted NameNode. Running this command will also ensure that the JournalNodes (as configured by dfs.namenode.shared.edits.dir) contain sufficient edits transactions to be able to start both NameNodes.

C)If you are converting a non-HA NameNode to be HA, you should run the command “hdfs namenode -initializeSharedEdits”, which will initialize the JournalNodes with the edits data from the local NameNode edits directories.

Step 19:啟動hdfs和yarn在各自的結點;
sbin/start-dfs.sh
sbin/start-yarn.sh

Step 20:檢查是否安裝成功

hdfs haadmin -getAllServiceState;
hdfs haadmin -transitionToActive nn1;

Usage: haadmin
[-transitionToActive <serviceId>]
[-transitionToStandby <serviceId>]
[-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
[-getServiceState <serviceId>]
[-getAllServiceState]
[-checkHealth <serviceId>]
[-help <command>]

Step 22:上傳檔案測試
# cd ~
# vi helloworld.txt
# hdfs dfs -put helloworld.txt helloworld.txt
ssh互信(免密碼登入)
#cd /opt/hadoop-2.9.2;bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /user/jacksun/input/core-site.xml output2

========================ZooKeeper Automatic Failover=======================================================================
Failure detection Active NameNode election Health monitoring ZooKeeper session management ZooKeeper-based election

Step 1: 時間同步
A)安裝NTP包
檢查是否安裝了ntp相關包。如果沒有安裝ntp相關包,使用rpm或yum安裝,安裝也非常簡單方便。
[[email protected] ~]# rpm -qa | grep ntp
ntpdate-4.2.6p5-1.el6.x86_64
fontpackages-filesystem-1.41-1.1.el6.noarch
ntp-4.2.6p5-1.el6.x86_64
B)配置vi /etc/ntp.conf
#新增修改本地伺服器
# Hosts on local network are less restricted.
restrict 192.168.209.131 mask 255.255.255.0 nomodify notrap

#註釋掉同步伺服器
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst

#新增或者去除註釋
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10

C)新增vi /etc/sysconfig/ntpd 檔案
SYNC_HWCLOCK=yes

D)其他結點 編輯 crontab -e 檔案
0-59/10 * * * * /usr/sbin/ntpdate jacksun01.com

Step 2:Configuring automatic failover
hdfs-site.xml

<!-- 是否啟用自動故障轉移 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- zookeeper監控伺服器 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>jacksun01.com:2181,jacksun02.com:2181,jacksun03.com:2181</value>
</property>

Step 3:sync hdfs-site.xml
cd /opt/hadoop-2.9.2/etc/hadoop/; scp core-site.xml hdfs-site.xml yarn-site.xml [email protected]:/opt/hadoop-2.9.2/etc/hadoop/ ; scp core-site.xml hdfs-site.xml yarn-site.xml [email protected]:/opt/hadoop-2.9.2/etc/hadoop/

Step 4:stop all hadoop daemon
cd /opt/hadoop-2.9.2;./sbin/stop-all.sh

Step 5:start zookeeper
cd /opt/zookeeper;./bin/zkServer.sh start


Step 6: Initializing HA state in ZooKeeper

cd /opt/hadoop-2.9.2;./bin/hdfs zkfc -formatZK

Step 7: 測試
Kill active namenode