1. 程式人生 > >基於hadoop2.7.3搭建多機環境(YARN+HA)

基於hadoop2.7.3搭建多機環境(YARN+HA)

第一:環境說明

  1. parallels desktop
  2. CentOS-6.5-x86_64-bin-DVD1.iso
  3. jdk-7u79-linux-x64.tar.gz
  4. Hadoop-2.7.3.tar.gz
  5. 搭建四個節點的叢集。他們的hostname分佈為hadoopA,hadoopB,hadoopC,hadoopD。其中hadoopA的角色為Activity namnode。hadoopB的角色為standby namenode,datanode,journalnode。hadoopC的角色為datanode,journalnode。hadoopD的角色為datanode,journalnode。

第二:作業系統配置

  1. 賦予hadoop使用者sudo許可權
[[email protected] hadoop]# visudo

## Allow root to run any commands anywhere
root    ALL=(ALL)       ALL
hadoop  ALL=(ALL)       ALL
  1. 修改hostname
[[email protected] hadoop-2.7.3]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain
4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.201 hadoopA 192.168.1.202 hadoopB 192.168.1.203 hadoopC 192.168.1.204 hadoopD

第三:安裝和配置jdk

分別在hadoopA,hadoopB,hadoopC,hadoopD四個節點安裝jdk。

[hadoop@hadoopb ~]$ tar -zxvf jdk-7u79-linux-x64.tar.gz

修改jdk的名稱

[hadoop@hadoopb
~]$ mv jdk1.7.0_79/ jdk1.7

第四:安裝和配置hadoop

  1. 在hadoopA,hadoopB,hadoopC,hadoopD四個節點上解壓hadoop
[hadoop@hadoopb ~]$ tar -zxvf hadoop-2.7.3.tar.gz
  1. 在hadoopA上配置hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/home/hadoop/jdk1.7
  1. 在hadoopA上配置core-site.xml
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoopA:8020</value>
        </property>
</configuration>
  1. 在hadoopA配置hdfs-site.xml
<configuration>

<property>
  <name>dfs.nameservices</name>
  <value>hadoop-test</value>
  <description>
    Comma-separated list of nameservices.
  </description>
</property>

<property>
  <name>dfs.ha.namenodes.hadoop-test</name>
  <value>nn1,nn2</value>
  <description>
    The prefix for a given nameservice, contains a comma-separated
    list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
  </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.hadoop-test.nn1</name>
  <value>hadoopA:8020</value>
  <description>
    RPC address for nomenode1 of hadoop-test
  </description>
</property>

<property>
  <name>dfs.namenode.rpc-address.hadoop-test.nn2</name>
  <value>hadoopB:8020</value>
  <description>
    RPC address for nomenode2 of hadoop-test
  </description>
</property>

<property>
  <name>dfs.namenode.http-address.hadoop-test.nn1</name>
  <value>hadoopA:50070</value>
  <description>
    The address and the base port where the dfs namenode1 web ui will listen on.
  </description>
</property>

<property>
  <name>dfs.namenode.http-address.hadoop-test.nn2</name>
  <value>hadoopB:50070</value>
  <description>
    The address and the base port where the dfs namenode2 web ui will listen on.
  </description>
</property>

<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:///home/hadoop/hdfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://hadoopB:8485;hadoopC:8485;hadoopD:8485/hadoop-test</value>
  <description>A directory on shared storage between the multiple namenodes
  in an HA cluster. This directory will be written by the active and read
  by the standby in order to keep the namespaces synchronized. This directory
  does not need to be listed in dfs.namenode.edits.dir above. It should be
  left empty in a non-HA cluster.
  </description>
</property>

<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:///home/hadoop/hdfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  </description>
</property>

<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>false</value>
  <description>
    Whether automatic failover is enabled. See the HDFS High
    Availability documentation for details on automatic HA
    configuration.
  </description>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/home/hadoop/hdfs/journal/</value>
</property>

</configuration>
  1. 在hadoopA配置mapred-site.xml
<configuration>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoopB:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoopB:19888</value>
</property>
</configuration>
  1. 在hadoopA配置yarn-site.xml
<configuration>

  <!-- Resource Manager Configs -->
  <property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoopA</value>
  </property>

  <property>
    <description>The address of the applications manager interface in the RM.</description>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
  </property>

  <property>
    <description>The address of the scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>${yarn.resourcemanager.hostname}:8030</value>
  </property>

  <property>
    <description>The http address of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>${yarn.resourcemanager.hostname}:8088</value>
  </property>

  <property>
    <description>The https adddress of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>${yarn.resourcemanager.hostname}:8090</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>${yarn.resourcemanager.hostname}:8031</value>
  </property>

  <property>
    <description>The address of the RM admin interface.</description>
    <name>yarn.resourcemanager.admin.address</name>
    <value>${yarn.resourcemanager.hostname}:8033</value>
  </property>

  <property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>

  <property>
    <description>fair-scheduler conf location</description>
    <name>yarn.scheduler.fair.allocation.file</name>
    <value>/home/hadoop/hadoop-2.7.3/etc/hadoop/fairscheduler.xml</value>
  </property>

  <property>
    <description>List of directories to store localized files in. An
      application's localized file directory will be found in:
      ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
      Individual containers' work directories, called container_${contid}, will
      be subdirectories of this.
   </description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/home/hadoop/yarn/local</value>
  </property>

  <property>
    <description>Whether to enable log aggregation</description>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
  </property>

  <property>
    <description>Amount of physical memory, in MB, that can be allocated
    for containers.</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>8720</value>
  </property>

  <property>
    <description>Number of CPU cores that can be allocated
    for containers.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>2</value>
  </property>

  <property>
    <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

</configuration>
  1. 在hadoopA配置fairscheduler.xml
<allocations>

  <queue name="infrastructure">
    <minResources>102400 mb, 50 vcores </minResources>
    <maxResources>153600 mb, 100 vcores </maxResources>
    <maxRunningApps>200</maxRunningApps>
    <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
    <weight>1.0</weight>
    <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
  </queue>

   <queue name="tool">
      <minResources>102400 mb, 30 vcores</minResources>
      <maxResources>153600 mb, 50 vcores</maxResources>
   </queue>

   <queue name="sentiment">
      <minResources>102400 mb, 30 vcores</minResources>
      <maxResources>153600 mb, 50 vcores</maxResources>
   </queue>

</allocations>
  1. 在hadoopA配置slaves檔案

[root@hadoopa hadoop]# cat slaves
hadoopB
hadoopC
hadoopD
  1. 將hadoopA上hadoop的安裝目錄複製到其它

[hadoop@hadoopa hadoop-2.7.3]$ scp etc/hadoop/* hadoopB://home/hadoop/hadoop-2.7.3/etc/hadoop/


[hadoop@hadoopa hadoop-2.7.3]$ scp etc/hadoop/* hadoopC://home/hadoop/hadoop-2.7.3/etc/hadoop/


[hadoop@hadoopa hadoop-2.7.3]$ scp etc/hadoop/* hadoopD://home/hadoop/hadoop-2.7.3/etc/hadoop/

第五:啟動hadoop

  1. 在各個JournalNode節點上,輸入以下命令啟動journalnode服務
[hadoop@hadoopb hadoop-2.7.3]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@hadoopc hadoop-2.7.3]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@hadoopd hadoop-2.7.3]$ sbin/hadoop-daemon.sh start journalnode
  1. 在[nn1]上,對其進行格式化,並啟動:
[root@hadoopa hadoop-2.7.3]# bin/hdfs namenode -format
[root@hadoopa hadoop-2.7.3]# sbin/hadoop-daemon.sh start namenode

  1. 在[nn2]上,同步nn1的元資料資訊
[hadoop@hadoopb hadoop-2.7.3]$ bin/hdfs namenode -bootstrapStandby
  1. 在[nn2]上,啟動NameNode:
[hadoop@hadoopb hadoop-2.7.3]$ sbin/hadoop-daemon.sh start namenode
(經過以上四步操作,nn1和nn2均處理standby狀態)
  1. 在[nn1]上,將NameNode切換為Active

[root@hadoopa hadoop-2.7.3]# bin/hdfs haadmin -transitionToActive nn1
  1. 在[nn1]上,啟動所有datanode

[root@hadoopa hadoop-2.7.3]# sbin/hadoop-daemons.sh start datanode
  1. 啟動yarn:在[nn1]上,輸入以下命令
[root@hadoopa hadoop-2.7.3]# sbin/start-yarn.sh
  1. 關閉Hadoop叢集:在[nn1]上,輸入以下命令
[root@hadoopa hadoop-2.7.3]# sbin/stop-dfs.sh
[root@hadoopa hadoop-2.7.3]# sbin/stop-yarn.sh

第六:驗證hadoop

  1. hadoopA輸入命令

[[email protected] jdk1.7]# /home/hadoop/jdk1.7/bin/jps
10747 -- process information unavailable
15583 Jps
16576 -- process information unavailable
  1. hadoopB輸入命令
[hadoop@hadoopb hadoop-2.7.3]$ /home/hadoop/jdk1.7/bin/jps
15709 NodeManager
2405 JournalNode
11551 NameNode
12862 DataNode
15398 Jps
  1. hadoopC輸入命令
[hadoop@hadoopc ~]$ /home/hadoop/jdk1.7/bin/jps
2388 JournalNode
13091 Jps
13553 DataNode
15214 NodeManager
  1. hadoopD輸入命令
[hadoop@hadoopd hadoop-2.7.3]$ /home/hadoop/jdk1.7/bin/jps
13506 DataNode
12675 Jps
15334 NodeManager
2570 JournalNode

開啟瀏覽器輸入以下地址:

http://192.168.1.201:50070/dfshealth.html#tab-overview
http://192.168.1.202:50070/dfshealth.html#tab-overview
http://192.168.1.201:8088/cluster/scheduler

第七:關閉hadoop

  1. 關閉Hadoop叢集:在[nn1]上,輸入以下命令
[root@hadoopa hadoop-2.7.3]# sbin/stop-dfs.sh
[root@hadoopa hadoop-2.7.3]# sbin/stop-yarn.sh

第八:特別說明

說明:
步驟2:在[nn1]上,對其進行格式化,並啟動:
bin/hdfs namenode -fromal
步驟3:在[nn2]上,同步nn1的元資料資訊
bin/hdfs namenode -bootstrapStandby

這兩步操作,只是在第一次建立叢集的時候才使用
下次重啟節點,是不需要操作這兩步