1. 程式人生 > >ubuntu16.04搭建hadoop集群環境

ubuntu16.04搭建hadoop集群環境

address hadoop 集群 所有 ipv ret 加載文件 keygen -- manager

1. 系統環境
Oracle VM VirtualBox
Ubuntu 16.04
Hadoop 2.7.4
Java 1.8.0_111

master:192.168.19.128
slave1:192.168.19.129
slave2:192.168.19.130

2. 部署步驟
在虛擬機環境中安裝三臺Ubuntu 16.04虛擬機,在這三臺虛擬機中配置一下基礎配置
2.1 基礎配置
1、安裝 ssh和openssh
sudo apt-get install ssh
sudo apt-get install rsync

2、添加 hadoop 用戶,並添加到 sudoers
sudo adduser hadoop
sudo vim /etc/sudoers
添加如下:
# User privilege specification
root ALL=(ALL:ALL) ALL
hadoop ALL=(ALL:ALL) ALL

3、切換到 hadoop 用戶:


su hadoop

4、修改 /etc/hostname
sudo vim /etc/hostname
將內容修改為master/slave1/slave2

5、修改 /etc/hosts
127.0.0.1 localhost
127.0.1.1 localhost.localdomain localhost
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# hadoop nodes
192.168.19.128 master
192.168.19.129 slave1
192.168.19.130 slave2

6、安裝配置 Java 環境

下載 jdk1.8 解壓到 /usr/local 目錄下(為了保證所有用戶都能使用),修改 /etc/profile,並生效:
# set jdk classpath
export JAVA_HOME=/usr/local/jdk1.8.0_111
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

重新加載文件
source /etc/profile

驗證 jdk 是否安裝配置成功

[email protected]:~$ java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

2.2 配置 master 節點可通過 SSH 無密碼訪問 slave1 和 slave2 節點

1.生成公鑰
[email protected]:~$ ssh-keygen -t rsa

2.配置公鑰
[email protected]:~$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
將生成的 authorized_keys 文件復制到 slave1 和 slave2 的 .ssh目錄下
scp .ssh/authorized_keys [email protected]:~/.ssh
scp .ssh/authorized_keys [email protected]:~/.ssh

3.master 節點無密碼訪問 slave1 和 slave2 節點
[email protected]:~$ ssh slave1
[email protected]:~$ ssh slave2

輸出:
[email protected]:~$ ssh slave1
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-31-generic x86_64)

* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Last login: Mon Nov 28 03:30:36 2016 from 192.168.19.1
[email protected]:~$

2.3 Hadoop 2.7集群部署
1、在master機器上,在hadoop 用戶目錄下解壓下載的 hadoop-2.7.4.tar.gz到用戶目錄下的software目錄下
[email protected]:~/software$ ll
total 205436
drwxrwxr-x 4 hadoop hadoop 4096 Nov 28 02:52 ./
drwxr-xr-x 6 hadoop hadoop 4096 Nov 28 03:58 ../
drwxr-xr-x 11 hadoop hadoop 4096 Nov 28 04:14 hadoop-2.7.4/
-rw-rw-r-- 1 hadoop hadoop 210343364 Apr 21 2015 hadoop-2.7.4.tar.gz

2、配置 hadoop 的環境變量
sudo vim /etc/profile

配置如下:
# set hadoop classpath
export HADOOP_HOME=/home/hadoop/software/hadoop-2.7.4
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_PREFIX=$HADOOP_HOME
export CLASSPATH=$CLASSPATH:.:$HADOOP_HOME/bin

加載配置

source /etc/profile

3、配置 hadoop 的配置文件,主要配置 core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml 文件
1>配置/home/hadoop/software/hadoop-2.7.4/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<!-- master: /etc/hosts 配置的域名 master -->
<value>hdfs://master:9000/</value>
</property>
</configuration>

2>配置/home/hadoop/software/hadoop-2.7.4/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/software/hadoop-2.7.4/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/software/hadoop-2.7.4/dfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
</configuration>

3>配置/home/hadoop/software/hadoop-2.7.4/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>

4>配置/home/hadoop/software/hadoop-2.7.4/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>

4、修改env環境變量文件,為 /home/hadoop/software/hadoop-2.7.4/etc/hadoop/hadoop-env.sh、mapred-env.sh、yarn-env.sh 文件添加 JAVA_HOME
# The java implementation to use.
export JAVA_HOME=/usr/local/jdk1.8.0_111/

5、配置 slaves 文件
slave1
slave2

6、 向 slave1 和 slave2 節點復制 hadoop2.7.4 整個目錄至相同的位置
[email protected]:~/software$ scp -r hadoop-2.7.4/ slave1:~/software
[email protected]:~/software$ scp -r hadoop-2.7.4/ slave2:~/software

至此,所有的配置已經完成,準備啟動hadoop服務。

2.4 從master機器啟動hadoop集群服務
1、初始格式化文件系統 bin/hdfs namenode -format
[email protected]:~/software/hadoop-2.7.4$ ./bin/hdfs namenode -format
輸出 master/192.168.19.128 節點的 NameNode has been successfully formatted.
......
16/11/28 05:10:56 INFO common.Storage: Storage directory /home/hadoop/software/hadoop-2.7.0/dfs/namenode has been successfully formatted.
16/11/28 05:10:56 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/11/28 05:10:56 INFO util.ExitUtil: Exiting with status 0
16/11/28 05:10:56 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.19.128
************************************************************/

2、啟動 Hadoop 集群 start-all.sh
[email protected]:~/software/hadoop-2.7.4$ ./sbin/start-all.sh
輸出結果:
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-namenode-master.out
slave2: starting datanode, logging to /home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-datanode-slave2.out
slave1: starting datanode, logging to /home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/software/hadoop-2.7.0/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/software/hadoop-2.7.0/logs/yarn-hadoop-resourcemanager-master.out
slave2: starting nodemanager, logging to /home/hadoop/software/hadoop-2.7.0/logs/yarn-hadoop-nodemanager-slave2.out
slave1: starting nodemanager, logging to /home/hadoop/software/hadoop-2.7.0/logs/yarn-hadoop-nodemanager-slave1.out

3、jps 輸出運行的 java 進程:
[email protected]:~$ jps
輸出結果:
26546 ResourceManager
26372 SecondaryNameNode
27324 Jps
26062 NameNode

4、瀏覽器查看 HDFS:http://192.168.19.128:50070
技術分享


5、瀏覽器查看 mapreduce:http://192.168.19.128:8088
技術分享


註意:在 hdfs namenode -format 或 start-all.sh 運行 HDFS 或 Mapreduce 無法正常啟動時(master節點或 slave 節點),可將 master 節點和 slave 節點目錄下的 dfs、logs、tmp 等目錄刪除,重新 hdfs namenode -format,再運行 start-all.sh

2.5 停止hadoop集群服務
[email protected]:~/software/hadoop-2.7.4$ ./sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slave2: stopping datanode
slave1: stopping datanode
Stopping secondary namenodes [master]
master: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slave2: stopping nodemanager
slave1: stopping nodemanager
no proxyserver to stop


ubuntu16.04搭建hadoop集群環境