1. 程式人生 > >大資料之(1)Centos7上搭建全分散式Hadoop叢集

大資料之(1)Centos7上搭建全分散式Hadoop叢集

本文介紹搭建一個Namenode兩個DataNode的Hadoop全分散式叢集的全部步驟及方法。具體環境如下:

一、環境準備

  1. 3個Centos7虛擬機器或者3個在一個區域網內的實際Centos7機器,機器上已安裝JDK1.8,至於不會安裝Centos7或者JDK1.8的同學可以自行網上百度教程,不為此文重點;
  2. 關閉禁用防火牆,主要是方便hadoop叢集內部相互之間可以順利訪問,方便於web端通過ip+埠訪問管理介面;
    [[email protected] ~]# systemctl stop firewall
    [[email protected]
    ~]# firewall-cmd --state not running
  3. 修改hostname和hosts檔案主節點namenode
    hostnamectl set-hostname master.hadoop #兩個從節點分別為slave1.hadoop和slave2.hadoop
    vim /etc/hosts
    
    #ip地址請替換成自己對應的地址
172.16.16.15 master.hadoop
172.16.16.12 slave1.hadoop
172.16.16.13 slave2.hadoop
IP地址 Hostname 描述
172.16.16.15 master.hadoop NameNode Master節點
172.16.16.12 slave1.hadoop DataNode slave節點1
172.16.16.15 slave2.hadoop DataNode slave節點2
配置完之後reboot重啟系統,然後用hostnamectl檢視是否hostname已經生效
```
	[[email protected]
~]# hostnamectl Static hostname: slave1.hadoop Icon name: computer-desktop Chassis: desktop Machine ID: 76547338655241a2b56abe659fe05dc1 Boot ID: 2d0f564b16f24bd7959ff4608d790223 Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-862.11.6.el7.x86_64 Architecture: x86-64 ```

二、免密碼登入

1、在master機器上輸入 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 建立一個無密碼的公鑰,-t是型別的意思,dsa是生成的金鑰型別,-P是密碼,’’表示無密碼,-f後是祕鑰生成後儲存的位置
2、在master機器上輸入 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 將公鑰id_dsa.pub新增進keys,這樣就可以實現無密登入ssh
3、在master機器上輸入 ssh master 測試免密碼登陸
4、在slave1.hadoop主機上執行 mkdir ~/.ssh
5、在slave2.hadoop主機上執行 mkdir ~/.ssh
6、在master機器上輸入 scp ~/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys 將主節點的公鑰資訊匯入slave1.hadoop節點,匯入時要輸入一下slave1.hadoop機器的登陸密碼
7、在master機器上輸入 scp ~/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys 將主節點的公鑰資訊匯入slave2.hadoop節點,匯入時要輸入一下slave2.hadoop機器的登陸密碼
8、在三臺機器上分別執行 chmod 600 ~/.ssh/authorized_keys 賦予金鑰檔案許可權
9、在master節點上分別輸入 ssh slave1.hadoop和 ssh slave2.hadoop測試是否配置ssh成功

三、下載解壓Hadoop

master機器建立Hadoop根目錄,下載,解壓Hadoop安裝包

mkdir /hadoop
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.9.0/hadoop-2.9.0-src.tar.gz
tar -zxvf hadoop-2.9.0-src.tar.gz

四、配置hadoop master節點

可先配置master.hadoop機器,然後通過scp的方式複製到兩個從節點

  • core-site.xml在中新增
    vim /hadoop/hadoop-2.9.0/etc/hadoop/core-site.xml
     <property>
            <name>fs.default.name</name>
            <value>hdfs://master.hadoop:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/hadoop/tmp</value>
        </property>
        <property>
            <name>io.file.buffer.size</name>
            <value>131702</value>
        </property>
    
  • hdfs-site.xml
    vim /hadoop/hadoop-2.9.0/etc/hadoop/hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:///home/hadoop/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:///home/hadoop/dfs/data</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>master.hadoop:50090</value>
        </property>
        <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
        </property>
    </configuration>
    
  • mapred-site.xml
    模板檔案copy一份自己的要修改的mapred-site.xml
    cp /hadoop/hadoop-2.9.0/etc/hadoop/mapred-site.xml.template /home/hadoop/hadoop- 2.9.0/etc/hadoop/mapred-site.xml

vim /home/hadoop/hadoop-2.9.0/etc/hadoop/mapred-site.xml

```
<configuration>
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
          <final>true</final>
    </property>
  <property>
     <name>mapreduce.jobtracker.http.address</name>
     <value>master.hadoop:50030</value>
  </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master.hadoop:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master.hadoop:19888</value>
    </property>
    <property>
        <name>mapred.job.tracker</name>
        <value>http://master.hadoop:9001</value>
    </property>
</configuration>
```
  • yarn-site.xml
    vim /home/hadoop/hadoop-2.9.0/etc/hadoop/yarn-site.xml
<configuration>
 <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>      <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master.hadoop:8032</value>
    </property>
    <property>
       <name>yarn.resourcemanager.scheduler.address</name>
        <value>master.hadoop:8030</value>
    </property>
    <property>
       <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master.hadoop:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master.hadoop:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master.hadoop:8088</value>
    </property>
     <property>
       <name>yarn.resourcemanager.hostname</name>
       <value>master.hadoop</value>
</property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>
</configuration>

五、配置JAVA_HOME&slaves

  • 配置JAVA_HOME
    配置/hadoop/hadoop-2.9.0/etc/hadoop目錄下hadoop.env.sh、yarn-env.sh的JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.8.0_181 #此處為你的jdk目錄
  • 配置slaves
    配置/hadoop/hadoop-2.9.0/etc/hadoop目錄下的slaves,刪除預設的localhost,新增2個slave節點
slave1.hadoop
slave2.hadoop

六、Hadoop複製

將master伺服器上配置好的Hadoop複製到各個節點對應位置上,通過scp傳送

scp -r /hadoop  172.16.16.12:/
scp -r /hadoop  172.16.16.13:/

七、啟動hadoop

在master節點啟動hadoop服務,各個從節點會自動啟動,進入/home/hadoop/hadoop-2.9.0/sbin/目錄,hadoop的啟動和停止都在master上進行

hdfs namenode -format


[[email protected] sbin]# ./start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master.hadoop]
master.hadoop: starting namenode, logging to /hadoop/hadoop-2.9.0/logs/hadoop-root-namenode-master.hadoop.out
slave1.hadoop: starting datanode, logging to /hadoop/hadoop-2.9.0/logs/hadoop-root-datanode-slave1.hadoop.out
slave2.hadoop: starting datanode, logging to /hadoop/hadoop-2.9.0/logs/hadoop-root-datanode-slave2.hadoop.out
Starting secondary namenodes [master.hadoop]
master.hadoop: starting secondarynamenode, logging to /hadoop/hadoop-2.9.0/logs/hadoop-root-secondarynamenode-master.hadoop.out
starting yarn daemons
starting resourcemanager, logging to /hadoop/hadoop-2.9.0/logs/yarn-root-resourcemanager-master.hadoop.out
slave1.hadoop: starting nodemanager, logging to /hadoop/hadoop-2.9.0/logs/yarn-root-nodemanager-slave1.hadoop.out
slave2.hadoop: starting nodemanager, logging to /hadoop/hadoop-2.9.0/logs/yarn-root-nodemanager-slave2.hadoop.out

[[email protected] sbin]# ./stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master.hadoop]
master.hadoop: stopping namenode
slave1.hadoop: stopping datanode
slave2.hadoop: stopping datanode
Stopping secondary namenodes [master.hadoop]
master.hadoop: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave1.hadoop: stopping nodemanager
slave2.hadoop: stopping nodemanager
slave1.hadoop: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave2.hadoop: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop

八、jps檢視啟動情況

在master和slave節點檢視是否成功啟動

[[email protected] sbin]# jps
24340 SecondaryNameNode
24837 Jps
24502 ResourceManager
24124 NameNode

九、web檢視叢集資訊

在這裡插入圖片描述