大資料之(1)Centos7上搭建全分散式Hadoop叢集
阿新 • • 發佈:2018-11-17
本文介紹搭建一個Namenode兩個DataNode的Hadoop全分散式叢集的全部步驟及方法。具體環境如下:
一、環境準備
- 3個Centos7虛擬機器或者3個在一個區域網內的實際Centos7機器,機器上已安裝JDK1.8,至於不會安裝Centos7或者JDK1.8的同學可以自行網上百度教程,不為此文重點;
- 關閉禁用防火牆,主要是方便hadoop叢集內部相互之間可以順利訪問,方便於web端通過ip+埠訪問管理介面;
[[email protected] ~]# systemctl stop firewall [[email protected]
- 修改hostname和hosts檔案主節點namenode
#ip地址請替換成自己對應的地址hostnamectl set-hostname master.hadoop #兩個從節點分別為slave1.hadoop和slave2.hadoop vim /etc/hosts
172.16.16.15 master.hadoop
172.16.16.12 slave1.hadoop
172.16.16.13 slave2.hadoop
IP地址 | Hostname | 描述 |
---|---|---|
172.16.16.15 | master.hadoop | NameNode Master節點 |
172.16.16.12 | slave1.hadoop | DataNode slave節點1 |
172.16.16.15 | slave2.hadoop | DataNode slave節點2 |
配置完之後reboot重啟系統,然後用hostnamectl檢視是否hostname已經生效
```
[[email protected] ~]# hostnamectl
Static hostname: slave1.hadoop
Icon name: computer-desktop
Chassis: desktop
Machine ID: 76547338655241a2b56abe659fe05dc1
Boot ID: 2d0f564b16f24bd7959ff4608d790223
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-862.11.6.el7.x86_64
Architecture: x86-64
```
二、免密碼登入
1、在master機器上輸入 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 建立一個無密碼的公鑰,-t是型別的意思,dsa是生成的金鑰型別,-P是密碼,’’表示無密碼,-f後是祕鑰生成後儲存的位置
2、在master機器上輸入 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 將公鑰id_dsa.pub新增進keys,這樣就可以實現無密登入ssh
3、在master機器上輸入 ssh master 測試免密碼登陸
4、在slave1.hadoop主機上執行 mkdir ~/.ssh
5、在slave2.hadoop主機上執行 mkdir ~/.ssh
6、在master機器上輸入 scp ~/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys 將主節點的公鑰資訊匯入slave1.hadoop節點,匯入時要輸入一下slave1.hadoop機器的登陸密碼
7、在master機器上輸入 scp ~/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys 將主節點的公鑰資訊匯入slave2.hadoop節點,匯入時要輸入一下slave2.hadoop機器的登陸密碼
8、在三臺機器上分別執行 chmod 600 ~/.ssh/authorized_keys 賦予金鑰檔案許可權
9、在master節點上分別輸入 ssh slave1.hadoop和 ssh slave2.hadoop測試是否配置ssh成功
三、下載解壓Hadoop
master機器建立Hadoop根目錄,下載,解壓Hadoop安裝包
mkdir /hadoop
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.9.0/hadoop-2.9.0-src.tar.gz
tar -zxvf hadoop-2.9.0-src.tar.gz
四、配置hadoop master節點
可先配置master.hadoop機器,然後通過scp的方式複製到兩個從節點
- core-site.xml在中新增
vim /hadoop/hadoop-2.9.0/etc/hadoop/core-site.xml<property> <name>fs.default.name</name> <value>hdfs://master.hadoop:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property>
- hdfs-site.xml
vim /hadoop/hadoop-2.9.0/etc/hadoop/hdfs-site.xml<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master.hadoop:50090</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
- mapred-site.xml
模板檔案copy一份自己的要修改的mapred-site.xml
cp /hadoop/hadoop-2.9.0/etc/hadoop/mapred-site.xml.template /home/hadoop/hadoop- 2.9.0/etc/hadoop/mapred-site.xml
vim /home/hadoop/hadoop-2.9.0/etc/hadoop/mapred-site.xml
```
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master.hadoop:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master.hadoop:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master.hadoop:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://master.hadoop:9001</value>
</property>
</configuration>
```
- yarn-site.xml
vim /home/hadoop/hadoop-2.9.0/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master.hadoop:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master.hadoop:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master.hadoop:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master.hadoop:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master.hadoop:8088</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master.hadoop</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
</configuration>
五、配置JAVA_HOME&slaves
- 配置JAVA_HOME
配置/hadoop/hadoop-2.9.0/etc/hadoop目錄下hadoop.env.sh、yarn-env.sh的JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.8.0_181 #此處為你的jdk目錄
- 配置slaves
配置/hadoop/hadoop-2.9.0/etc/hadoop目錄下的slaves,刪除預設的localhost,新增2個slave節點
slave1.hadoop
slave2.hadoop
六、Hadoop複製
將master伺服器上配置好的Hadoop複製到各個節點對應位置上,通過scp傳送
scp -r /hadoop 172.16.16.12:/
scp -r /hadoop 172.16.16.13:/
七、啟動hadoop
在master節點啟動hadoop服務,各個從節點會自動啟動,進入/home/hadoop/hadoop-2.9.0/sbin/目錄,hadoop的啟動和停止都在master上進行
hdfs namenode -format
[[email protected] sbin]# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master.hadoop]
master.hadoop: starting namenode, logging to /hadoop/hadoop-2.9.0/logs/hadoop-root-namenode-master.hadoop.out
slave1.hadoop: starting datanode, logging to /hadoop/hadoop-2.9.0/logs/hadoop-root-datanode-slave1.hadoop.out
slave2.hadoop: starting datanode, logging to /hadoop/hadoop-2.9.0/logs/hadoop-root-datanode-slave2.hadoop.out
Starting secondary namenodes [master.hadoop]
master.hadoop: starting secondarynamenode, logging to /hadoop/hadoop-2.9.0/logs/hadoop-root-secondarynamenode-master.hadoop.out
starting yarn daemons
starting resourcemanager, logging to /hadoop/hadoop-2.9.0/logs/yarn-root-resourcemanager-master.hadoop.out
slave1.hadoop: starting nodemanager, logging to /hadoop/hadoop-2.9.0/logs/yarn-root-nodemanager-slave1.hadoop.out
slave2.hadoop: starting nodemanager, logging to /hadoop/hadoop-2.9.0/logs/yarn-root-nodemanager-slave2.hadoop.out
[[email protected] sbin]# ./stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master.hadoop]
master.hadoop: stopping namenode
slave1.hadoop: stopping datanode
slave2.hadoop: stopping datanode
Stopping secondary namenodes [master.hadoop]
master.hadoop: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave1.hadoop: stopping nodemanager
slave2.hadoop: stopping nodemanager
slave1.hadoop: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
slave2.hadoop: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
八、jps檢視啟動情況
在master和slave節點檢視是否成功啟動
[[email protected] sbin]# jps
24340 SecondaryNameNode
24837 Jps
24502 ResourceManager
24124 NameNode