1. 程式人生 > >Hadoop完全分散式環境搭建

Hadoop完全分散式環境搭建

試驗環境: 1臺NameNode伺服器,2臺DataNode伺服器

安裝步驟:
①:配置/etc/hosts檔案:實現叢集內部的DNS解析,無需查詢DNS伺服器,當訪問遠端主機時首先查詢hosts檔案是否有配置,如果配置則直接按照指定的IP直接訪問遠端主機(實際規模較大的hadoop叢集一般會配置DNS伺服器進行統一管理)

修改linux主機的主機名:/etc/sysconfig/network檔案的HOSTNAME欄位的值即可(注意重啟才可永久生效)
hostname newName:重啟之後就會失效

hosts檔案:注意每個節點最好共享同一份hosts檔案

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.174.142 NameNode
192.168.174.143 DataNode_01
192.168.174.145 DataNode_02

測試hosts檔案:

[[email protected]_02 ~]$ ping DataNode_01
PING DataNode_01 (192.168.174.143) 56(84) bytes of data.
64 bytes from DataNode_01 (192.168.174.143): icmp_seq=1 ttl=64 time=2.24 ms
— DataNode_01 ping statistics —
7 packets transmitted, 7 received, 0% packet loss, time 6589ms
rtt min/avg/max/mdev = 0.275/0.733/2.241/0.624 ms

[[email protected]_02 ~]$ ping DataNode_02
PING DataNode_02 (192.168.174.145) 56(84) bytes of data.
64 bytes from DataNode_02 (192.168.174.145): icmp_seq=1 ttl=64 time=0.029 ms
— DataNode_02 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2381ms
rtt min/avg/max/mdev = 0.029/0.050/0.062/0.016 ms

結論:日誌顯示可以ping主機名,表明hosts檔案配置沒問題

②:配置hadoop核心配置檔案:hadoop-env.sh、 core-site.xml、hdfs-site.xml、 mapred-site.xml
hadoop-env.sh:(jdk安裝目錄配置)

export JAVA_HOME=/usr/local/java/jdk1.8.0_112

core-site.xml
注意:名稱節點位置需要實際的節點主機名或IP地址

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://NameNode:9000</value>
</property>
</configuration>

hdfs-site.xml:
注意資料塊存放目錄如果不存在,DataNode節點不會啟動DataNode守護程序
說明:因為配置兩臺DataNode節點,資料塊備份2份

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
 <name>dfs.data.dir</name>
 <value>/home/squirrel/Programme/hadoop-0.20.2/data</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>
</configuration>

mapred-site.xml:
注意:任務追蹤器的位置需改為實際的主機名或IP地址

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
 <name>mapred.job.tracker</name>
 <value>NameNode:9001</value>
</property>
</configuration>

③:配置masters和slaves檔案
注意:檔案裡面只需每行之名伺服器的主機名或IP地址即可
masters檔案: mater節點:NameNode/SecondaryNameNode/JobTracker

NameNode

slaves檔案: slave節點:DataNode/TaskTracker

DataNode_01
DataNode_02

④:將hadoop配置好的檔案全部共享給hadoop叢集節點伺服器
scp -r
/home/squirrel/Programme/hadoop-0.20.2 DataNode_01:/home/squirrel/Programme/hadoop-0.20.2
scp -r
/home/squirrel/Programme/hadoop-0.20.2 DataNode_02:/home/squirrel/Programme/hadoop-0.20.2

⑤:格式化HDFS檔案系統:名稱節點伺服器在hadoop的解壓目錄bin下使用./hadoop namenode -format

16/12/28 23:23:13 INFO namenode.NameNode: STARTUP_MSG:
/**************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = NameNode/192.168.174.142
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by ‘chrisdo’ on Fri Feb 19 08:07:34 UTC 2010
**************************************************/

16/12/28 23:23:15 INFO namenode.FSNamesystem: fsOwner=squirrel,squirrel
16/12/28 23:23:15 INFO namenode.FSNamesystem: supergroup=supergroup
16/12/28 23:23:15 INFO namenode.FSNamesystem: isPermissionEnabled=true
16/12/28 23:23:15 INFO common.Storage: Image file of size 98 saved in 0 seconds.
16/12/28 23:23:15 INFO common.Storage: Storage directory /tmp/hadoop-squirrel/dfs/name has been successfully formatted.
16/12/28 23:23:15 INFO namenode.NameNode: SHUTDOWN_MSG:
/**************************************************
SHUTDOWN_MSG: Shutting down NameNode at NameNode/192.168.174.142
**************************************************/

分析:

“/tmp/hadoop-squirrel/dfs/name has been successfully formatted.”列印日誌表明HDFS檔案系統格式化成功。

⑥:啟動Hadoop:在hadoop解壓目錄bin下執行./start-all.sh

starting namenode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-namenode-NameNode.out

DataNode_01: starting datanode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-datanode-DataNode_01.out

DataNode_02: starting datanode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-datanode-DataNode_02.out

NameNode: starting secondarynamenode, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-secondarynamenode-NameNode.out

starting jobtracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-jobtracker-NameNode.out

DataNode_02: starting tasktracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-tasktracker-DataNode_02.out

DataNode_01: starting tasktracker, logging to /home/squirrel/Programme/hadoop-0.20.2/bin/../logs/hadoop-squirrel-tasktracker-DataNode_01.out

分析:

日誌顯示啟動hadoop的5個守護程序:
datanode 、tasktracker、datanode、secondarynamenode、jobtracker,並且slaves檔案中配置的slave節點成功啟動

⑦:檢測hadoop啟動守護程序
NameNode節點執行jps:

15825 JobTracker
15622 NameNode
15752 SecondaryNameNode
15935 Jps

DataNode節點執行jps:

15237 DataNode
15350 Jps
15310 TaskTracker

結論:hadoop叢集完全成功啟動

注意:hadoop採用ssh實現檔案傳輸,因此必然涉及到hadoop叢集內部節點之間的檔案訪問許可權問題,建議hadoop目錄放在節點伺服器登陸使用者擁有完全許可權的目錄下,否則會出現日誌寫不進日誌檔案等問題