1. 程式人生 > >【hadoop】hadoop完全分散式叢集安裝

【hadoop】hadoop完全分散式叢集安裝

文章目錄

前言

後面準備更新hdfs操作(shell命令版本),hbase,hive的操作。

所以這裡先更新一下hadoop叢集安裝。

裝備

1.hadoop-2.6.5.tar.gz

2.三臺伺服器(虛擬機器就可以)

3.centos7

Core

  1. 伺服器規劃

    後面我就直接說名字不說IP了

(192.168.31.60)master (192.168.31.61)slave1 (192.168.31.62)slave2
NameNode ResourceManage SecondaryNameNode
DataNode DataNode DataNode
NodeManager NodeManager NodeManager
HistoryServer
  1. 下載hadoop原始碼包和JDK

    hadoop官方下載

    https://archive.apache.org/dist/hadoop/common/

    java官方下載

    https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

  2. 上傳到伺服器master

    根據個人規劃路徑

    # cd /app/install
    # ls
    hadoop-2.6.5.tar.gz
    jdk-8u171-linux-x64.tar.gz 
    
  3. 建立hadoop使用者

    # useradd hadoop
    # passwd hadoop
    
  4. 配置hostname

    # vi /etc/hosts
    
    192.168.31.60       master
    192.168.31.61       slave1
    192.168.31.62       slave2
    
  5. 配置SSH免密登入

    # cd ~/.ssh/  
    # ssh-keygen -t rsa
    # ssh-copy-id -i 192.168.31.60
    # scp -r /root/.ssh/ [email protected]:/root/ 
    # scp -r /root/.ssh/ [email protected]:/root/ 
    
  6. 安裝JDK

    $ cd /app/install
    $ tar -zxvf jdk-8u171-linux-x64.tar.gz -C /usr/local/java
    

    配置java環境變數

    $ vi /etc/profile
    
    #set java environment
    JAVA_HOME=/usr/local/java/jdk1.8.0_171
    JRE_HOME=$JAVA_HOME/jre
    PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME:/bin
    CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
    export JAVA_HOME JRE_HOME PATH CLASSPATH  
    
  7. 安裝Hadoop

    $ cd /app/install
    $ tar -zxvf hadoop-2.6.5.tar.gz -C /usr/local/
    

    配置hadoop環境變數

    $ vi /etc/profile
    
    #set hadoop environment
    export HADOOP_HOME=/usr/local/hadoop-2.6.5
    export PATH=$PATH:$HADOOP_HOME/bin
    
  8. 讓配置檔案起效

    $ source /etc/profile
    $ source /etc/hosts
    
  9. 修改hadoop配置檔案

    $ cd /usr/local/hadoop-2.6.5/etc/hadoop
    
  10. 修改hadoop-env.shmapred-env.sh、yarn-env.sh新增jdk路徑

    $ export JAVA_HOME=/usr/local/java/jdk1.8.0_171
    
  11. 配置core-site.xml

    $ vi core-site.xml
    
    <configuration>
     #NameNode的地址+埠
     <property>
       <name>fs.defaultFS</name>
       <value>hdfs://master:8020</value>
     </property>
     #hadoop臨時目錄的地址,預設情況NameNode和DataNode的資料檔案都會存在這個目錄
     <property>
       <name>hadoop.tmp.dir</name>
       <value>/usr/local/hadoop-2.6.5/data/tmp</value>
     </property>
     <property>
         <name>dfs.namenode.name.dir</name>
         <value>file://${hadoop.tmp.dir}/dfs/name</value>
     </property>
     <property>
         <name>dfs.datanode.data.dir</name>
         <value>file://${hadoop.tmp.dir}/dfs/data</value>
     </property>
    </configuration>
    
  12. 配置hdfs-site.xml

    $ vi hdfs-site.xml
    
    <configuration>
     #secondaryNameNode的地址+埠號
     <property>
       <name>dfs.namenode.secondary.http-address</name>
       <value>slave2:50090</value>
     </property>
    </configuration>
    
  13. 配置slaves

    master
    slave1
    slave2
    
  14. 配置yarn-site.xml

    $ vi yarn-site.xml
    
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
         #resourcemanager的地址
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>slave1</value>
        </property>
        #啟用日誌聚集功能
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        #日誌儲存時間
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>106800</value>
        </property>
    </configuration>
    
  15. 配置mapred-site.xml

    $ cp mapred-site.xml.template mapred-site.xml
    $ vi cp mapred-site.xml.template mapred-site.xml
    
    <configuration>
        #設定yarn執行mapreduce任務
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        #mapreduce的history伺服器安裝節點
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>master:10020</value>
        </property>
        #history的web地址
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>master:19888</value>
        </property>
    </configuration>
    
  16. 刪除doc

    $ cd /usr/local/hadoop-2.6.5/share
    $ rm -rf doc
    
  17. 配置另外兩臺伺服器(slave1,slave2)

    #複製hadoop到slave1,slave2
    $ scp -r hadoop-2.6.5/ [email protected]:/usr/local/
    $ scp -r hadoop-2.6.5/ [email protected]:/usr/local/
    #複製jdk到slave1,slave2
    $ scp -r java/ [email protected]:/usr/local/
    $ scp -r java/ [email protected]:/usr/local/
    
    #複製環境變數到slave1,slave2
    $ scp /etc/profile [email protected]:/etc/
    $ scp /etc/profile [email protected]:/etc/
    
    #複製hostname到slave1,slave2
    $ scp /etc/hosts [email protected]:/etc/
    $ scp /etc/hosts [email protected]:/etc/
    #記得source起效
    
  18. NameNode格式化

    $ cd /usr/local/hadoop-2.6.5/bin
    $ sh hdfs namenode –format
    
  19. 啟動叢集

    $ cd /usr/local/hadoop-2.6.5/sin
    $ sh start-dfs.sh
    
  20. 啟動yarn

    $ sh start-yarn.sh
    
  21. Slave1啟動ResourceManager

    $ ssh slave1
    $ cd /usr/local/hadoop-2.6.5/sin
    $ sh yarn-daemon.sh start resourcemanager
    
  22. master啟動historyServer

    $ cd /usr/local/hadoop-2.6.5/sin
    $ sh mr-jobhistory-daemon.sh start historyserver
    
  23. web頁面訪問

    http://master:50070/

    http://slave1:8088/cluster

  24. 圖看效果

    Nodes

cluster

總結

  1. 搭建叢集不難。重點是親手去操作。
  2. 後面用上hive了,加hive,用了hbase,加hbase
  3. 更新到了zookeeper,就慢慢改造成高可用的
  4. 轉載註明下作者 感謝~