1. 程式人生 > >hadoop2.7.4 完全分散式搭建(4臺)

hadoop2.7.4 完全分散式搭建(4臺)

1. 叢集的規劃

描述:hadoop HA機制的搭建依賴與zookeeper,所以選取三臺當作zookeeper叢集,總共準備了4臺主機,分別是hadoop01,hadoop02,hadoop03,hadoop04,其中hadoop01和hadoop02做namenode主備的切換,hadoop03和hadoop04作為resourcemanager的切換。

四臺主機配置
hadoop01 hadoop02 hadoop03 hadoop04
namenode
datanode
resourcemanager
nodemanager
zookeeper
journalnode
zkfc

2. 叢集伺服器的準備

  1. 修改主機名和主機IP
  2. 新增主機名和IP對映
  3. 新增普通使用者到sudoers許可權
  4. 設定服務啟動級別
  5. 同步Linux時間
  6. 關閉防火牆
  7. 配置SSH免密登陸
  8. 安裝JDK
  • 修改主機名和IP
    修改虛擬機器中的Linux IP,首先檢視虛擬機器Vmnet8網路設定,其中我的閘道器是192.168.146.2

    然後修改本地VMnet8網路設定:

    接著修改Linux網路設定(Redhat6.5):

    修改好IP後修改主機名:
    1. 修改hosts配置:
    [[email protected] hyotei]# vi /etc/hosts
    
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.146.100 hadoop01
    192.168.146.101 hadoop02
    192.168.146.102 hadoop03
    192.168.146.103 hadoop04
    
    
    ##因為我使用了4臺主機,所以將4臺主機名及IP都加入##

    2.修改network設定:

    [[email protected] hyotei]# vi /etc/sysconfig/network
    
    NETWORKING=yes
    HOSTNAME=hadoop01
    

    之後重啟即可

  • 新增windows主機名和對映:
    我是用的是sublime text 開啟windows 下的hosts檔案,首先右擊sublime使用管理員身份開啟,然後從sublime中找到windows下的hosts檔案(C:\Windows\System32\drivers\etc\hosts)

    192.168.146.100	hadoop01
    192.168.146.101	hadoop02
    192.168.146.102	hadoop03
    192.168.146.103	hadoop04

    將以下內容新增到檔案中,然後重啟電腦即可

  • 設定SSH免密登陸:

    [[email protected] hyotei]# ssh-keygen -t rsa       (這個是用來生成金鑰對的)
    [[email protected] hyotei]# ssh-copy-id hadoop02
    
    這樣hadoop01就可以利用ssh 遠端登陸hadoop02

3. 叢集的安裝

  1. 安裝zookeeper,下載地址:http://www-eu.apache.org/dist/zookeeper/
    1.1 首先解壓zookeeper檔案
    [[email protected] ~]$ tar zookeeper-3.4.6.tar.gz -C /usr/app
    
    解壓到/usr/app資料夾下

    1.2 配置zoo.cfg檔案:

    [[email protected] ~]$cd zookeeper-3.4.6/conf/
    [[email protected] ~]$mv zoo-sample.cfg zoo.cfg
    [[email protected] ~]$vi zoo.cfg
    
    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    # do not use /tmp for storage, /tmp here is just
    # example sakes.
    dataDir=/home/hyotei/opt/zookeeper       (制定zookeeper產生檔案的目錄)
    # the port at which the clients will connect
    clientPort=2181
    # the maximum number of client connections.
    # increase this if you need to handle more clients
    #maxClientCnxns=60
    #
    # Be sure to read the maintenance section of the
    # administrator guide before turning on autopurge.
    #
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
    #
    # The number of snapshots to retain in dataDir
    #autopurge.snapRetainCount=3
    # Purge task interval in hours
    # Set to "0" to disable auto purge feature
    #autopurge.purgeInterval=1
    server.1=hadoop01:2888:3888   
    server.2=hadoop02:2888:3888
    server.3=hadoop03:2888:3888
    
    

    將zookeeper檔案拷貝到hadoop02與hadoop03上去

    [[email protected] ~]$scp -r zookeeper-3.4.6 hadoop02:/home/hyotei/zookeeper-3.4.6
    [[email protected] ~]$scp -r zookeeper-3.4.6 hadoop03:/home/hyotei/zookeeper-3.4.6

    1.3配置myid,在hadoop01中:
    在該資料夾下執行:

    [[email protected] zookeeper]$mkdir data
    [[email protected] zookeeper]$echo 1 > myid
    

    在hadoop02,hadoop03下類似的執行

    [[email protected] zookeeper]$echo 2 > myid
    [[email protected] zookeeper]$echo 3 > myid

    1.4為方便zookeeper啟動,將zookeeper新增大環境變數中:

    [[email protected] zookeeper]$sudo vi /etc/profile
    
    新增
    export ZK_HOME=/home/hyotei/hadoop/zookeeper-3.4.6/
    export PATH==${JAVA_HOME}/bin:${ZK_HOME}/bin:$PATH
    
    [[email protected] zookeeper]$source /etc.profile
    [[email protected] zookeeper]$zkServer.sh start     (啟動zk)
    [[email protected] zookeeper]$zkServer.sh status     (檢視zk狀態)
    [[email protected] zookeeper]$zkServer.sh stop     (停止zk)
  2. [[email protected] ~]$tar -zxvf hadoop-2.7.4.tar.gz -C /usr/app/hadoop-2.7.4
    

    2.2修改配置檔案:配置檔案在/hadoop-2.7.4/etc/hadoop下
    2.2.1修改hadoop-env.sh:

    [[email protected] hadoop]$ vi hadoop-env.sh
    
    export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_171/      (將自己JAVA_HOME位置手動新增)
    
    JAVA_HOME位置可使用查到: echo $JAVA_HOME
    

    2.2.2修改core-site.xml:

    <configuration>
    <!-- 指定 hdfs 的 nameservice 為 myha01 -->
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://myhadoop/</value>
    </property>
    <!-- 指定 hadoop 工作目錄 -->
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/data/hadoopdata/</value>
    </property>
    <!-- 指定 zookeeper 叢集訪問地址 -->
    <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
    </configuration>

    2.2.3修改hdfs-site.xml:

    <configuration>
    <!-- 指定副本數 -->
    <property>
    <name>dfs.replication</name>
    <value>2</value>
    </property>
    <!--指定 hdfs 的 nameservice 為 myhadoop,需要和 core-site.xml 中保持一致-->
    <property>
    <name>dfs.nameservices</name>
    <value>myhadoop</value>
    </property>
    <!-- myhadoop 下面有兩個 NameNode,分別是 nn1,nn2 -->
    <property>
    <name>dfs.ha.namenodes.myhadoop</name>
    <value>nn1,nn2</value>
    </property>
    <!-- nn1 的 RPC 通訊地址 -->
    <property>
    <name>dfs.namenode.rpc-address.myhadoop.nn1</name>
    <value>hadoop01:9000</value>
    </property>
    <!-- nn1 的 http 通訊地址 -->
    <property>
    	<name>dfs.namenode.http-address.myhadoop.nn1</name>
    <value>hadoop01:50070</value>
    </property>
    <!-- nn2 的 RPC 通訊地址 -->
    <property>
    <name>dfs.namenode.rpc-address.myhadoop.nn2</name>
    <value>hadoop02:9000</value>
    </property>
    <!-- nn2 的 http 通訊地址 -->
    <property>
    <name>dfs.namenode.http-address.myhadoop.nn2</name>
    <value>hadoop02:50070</value>
    </property>
    <!-- 指定 NameNode 的 edits 元資料在 JournalNode 上的存放位置 -->
    <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/myhadoop</value>
    </property>
    <!-- 指定 JournalNode 在本地磁碟存放資料的位置 -->
    <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/data/journaldata</value>
    </property>
    <!-- 開啟 NameNode 失敗自動切換 -->
    <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
    </property>
    <!-- 配置失敗自動切換實現方式 -->
    <!-- 此處配置在安裝的時候切記檢查不要換行-->
    <property>
    <name>dfs.client.failover.proxy.provider.myhadoop</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行-->
    <property>
    <name>dfs.ha.fencing.methods</name>
    <value>
    sshfence
    shell(/bin/true)
    </value>
    </property>
    <!-- 使用 sshfence 隔離機制時需要 ssh 免登陸 -->
    <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hyotei/.ssh/id_rsa</value>
    </property>
    <!-- 配置 sshfence 隔離機制超時時間 -->
    <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
    </property>
    </configuration>

    2.2.4修改mapred-site.xml:(需要先將mapred-site.xml.template改名mapred-site.xml)

    <configuration>
    <!-- 指定 mr 框架為 yarn 方式 -->
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    <!-- 設定 mapreduce 的歷史伺服器地址和埠號 -->
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop01:10020</value>
    </property>
    <!-- mapreduce 歷史伺服器的 web 訪問地址 -->
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop01:19888</value>
    </property>
    </configuration>

    2.2.5修改yarn-site.xml:

    <configuration>
    <!-- 開啟 RM 高可用 -->
    <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
    </property>
    <!-- 指定 RM 的 cluster id -->
    <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yrc</value>
    </property>
    <!-- 指定 RM 的名字 -->
    <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
    </property>
    <!-- 分別指定 RM 的地址 -->
    <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>hadoop03</value>
    </property>
    <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>hadoop04</value>
    </property>
    <!-- 指定 zk 叢集地址 -->
    <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
    <!-- 要執行 MapReduce 程式必須配置的附屬服務 -->
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <!-- 開啟 YARN 叢集的日誌聚合功能 -->
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>
    <!-- YARN 叢集的聚合日誌最長保留時長 -->
    <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>86400</value>
    </property>
    <!-- 啟用自動恢復 -->
    <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
    </property>
    <!-- 制定 resourcemanager 的狀態資訊儲存在 zookeeper 叢集上-->
    <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
    <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>1536</value>
    </property>
    <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>1</value>
    </property>
    </configuration>

    2.2.6修改slaves:

    [[email protected] hadoop]$ vi slaves
    新增datanode結點地址:
    hadoop01
    hadoop02
    hadoop03
    hadoop04
    

    2.2.7分發安裝包到其他機器:

    scp -r hadoop-2.7.4 hadoop02:/home/hyotei/hadoop-2.7.4
    scp -r hadoop-2.7.4 hadoop03:/home/hyotei/hadoop-2.7.4
    scp -r hadoop-2.7.4 hadoop04:/home/hyotei/hadoop-2.7.4

    2.2.8每臺機器上分別配置環境變數:

    [[email protected] ~]$ sudo vi /etc/profile
    新增
    export HADOOP_HOME=/home/hyotei/hadoop-2.7.4
    export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZK_HOME}/bin:$PATH
    
    儲存退出
    [[email protected] ~]$ source /etc/profile
  3. 叢集初始化(嚴格要求):
    3.1先啟動zookeeper(hadoop01,hadoop02,hadoop03)都需要啟動

    [[email protected] ~]$zkServer.sh start
    [[email protected] ~]$zkServer.sh status    (檢查是否啟動成功)
    

    3.2分別啟動journalnode:

    [[email protected] ~]$hadoop-daemon.sh start journalnode
    [[email protected] ~]$hadoop-daemon.sh start journalnode
    [[email protected] ~]$hadoop-daemon.sh start journalnode

    jps檢查是否有jurnalnode 程序
    3.3第一個namenode節點進行初始化操作:

    [[email protected] ~]$hadoop namenode -format

    然後會在 core-site.xml 中配置的臨時目錄中生成一些叢集的資訊,把他拷貝的第二個 namenode 的相同目錄下

    <name>hadoop.tmp.dir</name>
    <value>/home/hyotei/hadoop-2.7.4/data/hadoopdata/</value>
    [[email protected] hadoop-2.7.4]$scp -r ./data/hadoopdata/ hadoop02:/home/hyotei/hadoop-2.7.4/data
    或者第二個節點
    [[email protected] ~]$hadoop namenode -bootstrapStandby

    3.4格式化zkfc(第一臺機器上即可):

    [[email protected] hadoop-2.7.4]$hdfs zkfc -formatZK

    3.5啟動HDFS:

    [[email protected] hadoop-2.7.4]$start-dfs.sh

    JPS檢視各節點程序是否啟動正常:依次為 1234 四臺機器的程序
    web訪問頁面:第一個namenode      http://hadoop01:50070
                           
    第二個namenode      http://hadoop02:50070
    3.6 hadoop03上啟動YARN:

    [[email protected] hadoop-2.7.4]$start-yarn.sh

    若備用節點的 resourcemanager 沒有啟動起來,則手動啟動起來

    [[email protected] hadoop-2.7.4]$yarn-daemon.sh start resourcemanager

    resourcemanager的web頁面:http://hadoop03:8088 
    hadoop04 是 standby resourcemanager,會自動跳轉到 hadoop03

  4. 檢視各個namenode和resourcemanager節點狀態:

    [[email protected] hadoop-2.7.4]$hdfs haadmin -getServiceState nn1
    active
    [[email protected] hadoop-2.7.4]$hdfs haadmin -getServiceState nn2
    standby
    [[email protected] hadoop-2.7.4]$yarn rmadmin -getServiceState rm1
    active
    [[email protected] hadoop-2.7.4]$yarn rmadmin -getServiceState rm2
    standby
  5. 啟動 mapreduce 任務歷史伺服器:

    [[email protected] hadoop-2.7.4]$ mr-jobhistory-daemon.sh start historyserver
  6. 初始化後再次啟動hadoop叢集順序:
    zkServer.sh start  (三臺)  => start-dfs.sh (第一個namenode節點)=> start-yarn.sh (hadoop03上) => yarn-daemon.sh start resourcemanager (一般備用的resourcemanger 總是啟動不了需要手動啟動)

  7. 如果存在有的機器namenode 或則datanode節點沒有啟動的了,可以嘗試 :

    [[email protected] hadoop-2.7.4]$hadoop-daemon.sh start namenode
    [[email protected] hadoop-2.7.4]$hadoop-daemon.sh start datanode
  8. 如果嘗試啟動的節點仍然會消失或者在初始化或啟動不成功,可以檢視對應機器的hadoop日誌:
    日誌檔案存在與/hadoop-2.7.4/logs資料夾下:

    假設hadoop01下的namenode沒有啟動成功,那就執行:

    [[email protected] logs]$ less hadoop-hyotei-namenode-hadoop01.log
    

    檢視error