1. 程式人生 > >bigdata-02-hadoop2.8.4-resourceHA安裝

bigdata-02-hadoop2.8.4-resourceHA安裝

臨時 enabled 多個 star bubuko name fse centos 9.1

1, 電腦環境準備

1), 關閉selinux

vim /etc/selinux/config

SELINUX=disabled

2), 時間同步

yum -y install chrony  

修改時間服務器配置, 並重啟

vim  /etc/chrony.conf

[root@dock hadoop]# cat /etc/chrony.conf | grep -v ^$ | grep -v ^#
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 
3.centos.pool.ntp.org iburst driftfile /var/lib/chrony/drift makestep 1.0 3 rtcsync allow 192.168.199.0/16 local stratum 10 logdir /var/log/chrony

修改需要同步的服務器配置, 並重啟

vim /etc/chrony.conf

[root@node1 ~]# cat /etc/chrony.conf | grep -v ^$ | grep -v ^#
server 192.168.199.131 iburst
driftfile /var/lib/chrony/drift
makestep 
1.0 3 rtcsync logdir /var/log/chrony

執行時間同步

systemctl restart chronyd

[root@node2 ~]# chronyc sources -v
210 Number of sources = 1

  .-- Source mode  ^ = server, = = peer, # = local clock.
 / .- Source state * = current synced, + = combined , - = not combined,
| /   ? = unreachable, x = time may be in
error, ~ = time too variable. || .- xxxx [ yyyy ] +/- zzzz || Reachability register (octal) -. | xxxx = adjusted offset, || Log2(Polling interval) --. | | yyyy = measured offset, || \ | | zzzz = estimated error. || | | MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^* dock 3 6 177 4 -1590ns[ +62us] +/- 13ms

查看時間同步:

[root@node3 ~]# timedatectl 
      Local time: Wed 2018-03-21 08:16:02 EDT
  Universal time: Wed 2018-03-21 12:16:02 UTC
        RTC time: Wed 2018-03-21 12:16:02
       Time zone: America/New_York (EDT, -0400)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: yes
 Last DST change: DST began at
                  Sun 2018-03-11 01:59:59 EST
                  Sun 2018-03-11 03:00:00 EDT
 Next DST change: DST ends (the clock jumps one hour backwards) at
                  Sun 2018-11-04 01:59:59 EDT
                  Sun 2018-11-04 01:00:00 EST

3), 修改hostname, 很多集群都需要執行這一個

hostname node1,

hostname node2

hostname node3

4), jdk 版本

java -version 1.8.0_161

5), 設置免密登陸

ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa

發送到namenode, 設置

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

2, zookeeper 安裝

參照其他博客..

3, hadoop安裝

zkFc-用來做HA的備份和切換的, 做active, standby的狀態管理的, 監控namenode進程, 記錄信息到zookeeper中

journalNode--復制fsimage和edtis的

1), 修改環境變量

export HADOOP_HOME=/usr/local/hadoop-2.7.5
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

2), 修改hadoop-env.sh

cd {HADOOP_HOME}/etc/hadoop
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_161

3), 配置core_site.xml

<configuration>
    <property>
     <--! 指定hdfs的nameservice --> <name>fs.defaultFS</name> <value>hdfs://hdfscluster</value> </property> <property>
    <!-- 指定hadoop臨時目錄 --> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop-2.8.4/tmp</value> </property> <property>
    <!-- 指定zookeeper地址 --> <name>ha.zookeeper.quorum</name> <value>node1:2181,node2:2181,node3:2181</value> </property> </configuration>

4), 修改 hdfs-site.xml

<configuration>
    <!--指定hdfs的nameservice為ns1,需要和core-site.xml中的保持一致 -->
    <property>
        <name>dfs.nameservices</name>
        <value>hdfscluster</value>
    </property>
    <!-- ns1下面有兩個NameNode,分別是nn1,nn2 -->
    <property>
        <name>dfs.ha.namenodes.hdfscluster</name>
        <value>nn1,nn2</value>
    </property>
    <!-- nn1的RPC通信地址 -->
    <property>
        <name>dfs.namenode.rpc-address.hdfscluster.nn1</name>
        <value>192.168.199.182:8020</value>
    </property>
    <!-- nn1的http通信地址 -->
    <property>
        <name>dfs.namenode.http-address.hdfscluster.nn1</name>
        <value>192.168.199.182:50070</value>
    </property>
    <!-- nn2的RPC通信地址 -->
    <property>
        <name>dfs.namenode.rpc-address.hdfscluster.nn2</name>
        <value>192.168.199.247:8020</value>
    </property>
    <!-- nn2的http通信地址 -->
    <property>
        <name>dfs.namenode.http-address.hdfscluster.nn2</name>
        <value>192.168.199.247:50070</value>
    </property>
    <!-- 指定NameNode的元數據在JournalNode上的存放位置 -->
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://node1:8485;node2:8485;node3:8485/hdfscluster</value>
    </property>
    <!-- 指定JournalNode在本地磁盤存放數據的位置 -->
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/usr/local/hadoop-2.8.4/journaldata</value>
    </property>
    <!-- 開啟NameNode失敗自動切換 -->
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    <!-- 配置失敗自動切換實現方式 -->
    <property>
        <name>dfs.client.failover.proxy.provider.hdfscluster</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行-->
<property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>

    <!-- 使用sshfence隔離機制時需要ssh免登陸 -->
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_dsa</value>
    </property>
    <!-- 配置sshfence隔離機制超時時間 -->
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>
</configuration>

備註: 如果集群成功後, 但創建目錄顯示: ipc.Client: Retrying connect to serve, 就更改為

5), 添加 slaves

vim slaves

node1
node2
node3

4, 配置yarn

1), 修改mapred-site.xml.template 為 mapred-site.xml

<configuration>  
    <!-- 指定mr框架為yarn方式 -->  
    <property>  
        <name>mapreduce.framework.name</name>  
        <value>yarn</value>  
    </property>  
</configuration> 

2), 配置 yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<!-- 開啟RM高可用 -->
    <property>
       <name>yarn.resourcemanager.ha.enabled</name>  
       <value>true</value>  
    </property>  
    <!-- 指定RM的cluster id -->
    <property>  
       <name>yarn.resourcemanager.cluster-id</name>
       <value>yarncluster</value>
    </property>  
    <!-- 指定RM的名字 -->  
    <property>  
       <name>yarn.resourcemanager.ha.rm-ids</name>  
       <value>rm1,rm2</value>
    </property>
    <!-- 分別指定RM的地址 -->
    <property>  
       <name>yarn.resourcemanager.hostname.rm1</name>
       <value>node1</value>
    </property>
    <property>
       <name>yarn.resourcemanager.hostname.rm2</name>
       <value>node2</value>
    </property>
    <!-- 指定zk集群地址 -->
    <property>
       <name>yarn.resourcemanager.zk-address</name>
       <value>node1:2181,node2:2181,node3:2181</value>
    </property>
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>
</configuration>

5, 格式化namenode

1), 3臺機器啟動 journalenode

hadoop-daemon.sh start journalnode

2), 格式化namenode, 並啟動

hdfs namenode -format
hadoop-daemon.sh start namenode

技術分享圖片

3), 在另一個namenode上拷貝, 或者手動拷貝

hdfs namenode -bootstrapStandby

4), 啟動第二個namenode

hadoop-daemon.sh start namenode

5), 在activeNameNode上格式化zookeeper

hdfs zkfc -formatZK

6), 啟動

start-dfs.sh

此時可通過 node1:50070 訪問 hadoop

6, 啟動yarn

1), 在nameNode上執行

start-yarn.sh

2), 啟動 resourcenamenager

yarn-HA, 不需要記錄狀態, 所以非常簡單

yarn-daemon.sh start resourcemanager

此時可通過 node1:8088 進行訪問

以後啟動時, 先啟動3臺zookeeper, 然後 start-dfs.sh 即可以了

7, 進行測試

1, 創建輸入, 輸出目錄

hadoop fs -mkdir -p /data/wordcount
hadoop fs -mkdir -p /output

2, 上傳文件

hadoop fs -put README.txt /data/wordcount

3, 執行樣例

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount /data/wordcount /output/wordcount

技術分享圖片

4, 查看分片文件

hadoop fs -text /output/wordcount/part-r-00000

HA編程的時候應該註意:

1, 代碼訪問hdfs的時候,

FileSystem.get(new URI("hfs://hdfscluster/", conf), conf, "root);

需要將配置文件

hdfs-site.xml, core-site.xml, yarn-site.xml, mapred-site.xml 放在resources下,

在 new Configuration() 的時候, 會自動加載resources中的配置文件

bigdata-02-hadoop2.8.4-resourceHA安裝