1. 程式人生 > >Hadoop2.7.3單機偽分散式環境搭建

Hadoop2.7.3單機偽分散式環境搭建

Hadoop2.7.3單機偽分散式環境搭建 作者:家輝,日期:2018-07-10 CSDN部落格: http://blog.csdn.net/gobitan
說明:Hadoop測試環境經常搭建,這裡也做成一個模板並記錄下來。 基礎環境:CentOS7模板,參考: https://blog.csdn.net/gobitan/article/details/80993354 CentOS7_64位作業系統模板搭建
安裝之前,先看一下Hadop架構圖。 第一步:Hadoop安裝包下載 我的百度網盤:連結:
https://pan.baidu.com/s/1I351UowJLfkClf6v0iRytA
密碼:5c9d
第二步:解壓hadoop包 [[email protected] ~]# cd /opt [[email protected] opt]# tar zxf /root/hadoop-2.7.3.tar.gz [[email protected] opt]# cd hadoop-2.7.3/
第三步:配置Hadoop [1] 配置 hadoop-env.sh 編輯etc/hadoop/hadoop-env.sh,修改JAVA_HOME的值如下: # The java implementation to use. export JAVA_HOME=/usr/java/jdk1.8.0_171-amd64/jre
[2] 配置core-site.xml
編輯etc/hadoop/core-site.xml,修改如下: <configuration>     <property>         <name>fs.defaultFS</name>         <value>hdfs://192.168.159.154:9000</value>     </property>     <property>         <name>hadoop.tmp.dir</name>         <value>/opt/hadoop-2.7.3/hadoop-tmp</value>     </property> </configuration> 說明:
hadoop.tmp.dir預設值為"/tmp/hadoop-${user.name}"。Linux作業系統重啟後,這個目錄會被清空,這可能導致資料丟失,因此需要修改。
[3] 配置hdfs-site.xml 編輯etc/hadoop/hdfs-site.xml,修改如下: <configuration>     <property>         <name>dfs.replication</name>         <value>1</value>     </property> </configuration>
[4] 配置SSH無密登入 本機 [[email protected] hadoop-2.7.3]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa [[email protected] hadoop-2.7.3]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys [[email protected] hadoop-2.7.3]# chmod 0600 ~/.ssh/authorized_keys 執行如下命令驗證SSH配置,如下: [[email protected] hadoop-2.7.3]# ssh localhost Last login: Sun May 13 23:06:10 2018 from localhost 這個過程不需要輸入密碼了,但之前是需要的。
第四步:格式化檔案系統 [[email protected] hadoop-2.7.3]# bin/hdfs namenode -format 如果執行成功,會在日誌末尾看到格式化成功的提示,如下: 18/05/13 23:24:02 INFO common.Storage: Storage directory /opt/hadoop-2.7.3/hadoop-tmp/dfs/name has been successfully formatted.
第五步:啟動HDFS [[email protected] hadoop-2.7.3]# sbin/start-dfs.sh Starting namenodes on [centos7] centos7: starting namenode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-namenode-centos7.out localhost: starting datanode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-datanode-centos7.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-centos7.out
[[email protected] hadoop-2.7.3]# jps 11301 SecondaryNameNode 11175 DataNode 11419 Jps 11087 NameNode [[email protected] hadoop-2.7.3]# 上面的啟動命令啟動了HDFS的管理節點NameNode和資料節點DataNode,以及NameNode的Slave節點,即SecondaryNameNode。
第六步:檢視HDFS的NameNode的Web介面 http://192.168.159.154:50070

第七步:建立HDFS目錄,以便執行MapReduce任務 [[email protected] hadoop-2.7.3]# bin/hdfs dfs -mkdir /user [[email protected] hadoop-2.7.3]# bin/hdfs dfs -mkdir /user/test
第八步:拷貝輸入檔案到分散式檔案系統 [[email protected] hadoop-2.7.3]# bin/hdfs dfs -put etc/hadoop input 這裡舉例拷貝et/hadoop目錄下的檔案到HDFS中。
檢視拷貝結果 [[email protected] hadoop-2.7.3]# bin/hadoop fs -ls /user/test/input
第九步:執行Hadopo自帶的例子 [[email protected] hadoop-2.7.3]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+' 說明: [1] 這個例子是計算某個目錄下所有檔案中包含某個字串的次數,這裡是匹配'dfs[a-z.]+'的次數。 [2] 中間有報如下錯誤,暫忽略。"18/05/14 00:03:54 WARN io.ReadaheadPool: Failed readahead on ifile EBADF: Bad file descriptor"
第十步:將結果從分散式檔案系統拷貝到本地 [[email protected] hadoop-2.7.3]# bin/hdfs dfs -get output output [[email protected] hadoop-2.7.3]# cat output/* 6       dfs.audit.logger 4       dfs.class 3       dfs.server.namenode. 2       dfs.period 2       dfs.audit.log.maxfilesize 2       dfs.audit.log.maxbackupindex 1       dfsmetrics.log 1       dfsadmin 1       dfs.servers 1       dfs.replication 1       dfs.file
或者直接檢視 [[email protected] hadoop-2.7.3]# bin/hdfs dfs -cat output/* 這裡可以看到每個包含dfs的關鍵詞在etc/hadoop的所有檔案中出現的次數的統計。
第十一步:驗證結果: 用linux命令來統計一下"dfs.class"的次數,結果為4次,與mapreduce統計的一致。 [[email protected] hadoop-2.7.3]# grep -r 'dfs.class' etc/hadoop/ etc/hadoop/hadoop-metrics.properties:dfs.class=org.apache.hadoop.metrics.spi.NullContext etc/hadoop/hadoop-metrics.properties:#dfs.class=org.apache.hadoop.metrics.file.FileContext etc/hadoop/hadoop-metrics.properties:# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext etc/hadoop/hadoop-metrics.properties:# dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
第十一步:關閉主程式 最後,如果使用完畢,可以關閉Hadoop。 [[email protected] hadoop-2.7.3]# sbin/stop-dfs.sh
另外,還可以通過YARN來提交job任務。步驟如下: 第十二步:配置mapred-site.xml [[email protected] hadoop-2.7.3]# mv etc/hadoop/mapred-site.xml.template  etc/hadoop/mapred-site.xml 編輯etc/hadoop/mapred-site.xml,修改如下: <configuration>     <property>         <name>mapreduce.framework.name</name>         <value>yarn</value>     </property> </configuration>
第十三步:配置yarn-site.xml 編輯etc/hadoop/yarn-site.xml,修改如下: <configuration> <!-- Site specific YARN configuration properties -->     <property>         <name>yarn.nodemanager.aux-services</name>         <value>mapreduce_shuffle</value>     </property> </configuration>
第十四步:啟動ResourceManager和NodeManager 注意:執行下面的命令之前,先確保已執行"sbin/start-dfs.sh"。 [[email protected] hadoop-2.7.3]# sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /opt/hadoop-2.7.3/logs/yarn-root-resourcemanager-centos7.out localhost: starting nodemanager, logging to /opt/hadoop-2.7.3/logs/yarn-root-nodemanager-centos7.out
第十五步:啟動historyserver [[email protected] hadoop-2.7.3]# sbin/mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /opt/hadoop-2.7.3/logs/mapred-root-historyserver-centos7.out
確認程序已啟動 [[email protected] hadoop-2.7.3]# jps 1670 ResourceManager 1272 NameNode 1769 NodeManager 1370 DataNode 2234 Jps 1501 SecondaryNameNode 1838 JobHistoryServer [[email protected] hadoop-2.7.3]#
第十六步:檢視ResourceManager的Web介面 http://192.168.159.154:8088
第十七步:檢視Job History Server的web頁面 http://192.168.159.154:19888/
第十八步:執行MapReduce job任務 跟前面的命令一樣,但是我們將結果輸出目錄改為output-yarn,如下: [[email protected] hadoop-2.7.3]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output-yarn 'dfs[a-z.]+'
檢視結果 [[email protected] hadoop-2.7.3]# bin/hdfs dfs -cat output-yarn/* 可以看到結果與之前執行的一致,這裡就不列出。
第十九步:停止YARN
[[email protected] hadoop-2.7.3]# sbin/stop-yarn.sh
參考資料: [1]  http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SingleCluster.html [2]  https://blog.csdn.net/gobitan/article/details/13020211 Hadoop2.2.0單節點安裝及測試 [3]  https://www.cnblogs.com/ee900222/p/hadoop_1.html