Hadoop-2.7.3叢集搭建中遇到的問題總結
0 前言:
1)Hadoop叢集搭建參照前一篇博文Hadoop叢集安裝配置教程
2)叢集有三個節點:Master、Slave1、Slave2,其中Master只作namenode,其餘兩個從節點做datanode
1 搭建過程中常用Hadoop指令:
1)啟動Hadoop指令:
start-all.sh
mr-jobhistory-daemon.sh start historyserver
啟動成功過程log輸出:
2)檢視DataNode是否正常啟動命令:
hdfs dfsadmin -report
3)建立HDFS上的使用者輸入輸出目錄:
hdfs dfs -mkdir /user/hadoop/input
hdfs dfs -mkdir /user/hadoop/output
4)將檔案作為輸入檔案複製到分散式檔案系統中:
以hadoop中配置檔案作為輸入資料來源
hdfs dfs -put $HADOOP_HOME/etc/hadoop/*.xml /user/hadoop/input
注:複製後可通過web頁面檢視DataNode的Block pool used是否有變化
5)MapReduce作業:
以wordCount為例
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-example -2.7.3.jar wordcount /user/hadoop/input /user/hadoop/output
注意:在此之前,必須手動新建output目錄,目錄不能重複!
hdfs dfs -mkdir /user/hadoop/output
6)停止Hadoop指令:
mr-jobhistory-daemon.sh stop historyserver
stop-all.sh
2 配置叢集以及執行mapreduce中遇到的問題及解決方案分享:
1)SSH無密碼登陸子節點失敗:
原因:主節點和子節點編碼格式不一致
解決:統一編碼格式,具體如何實現SSH無密碼登陸,參照上一篇博文
2)Hadoop安裝包共享問題:
解決:Windows和Linux之間檔案傳輸可使用secureCRT軟體,在linux上安裝”lrzsz”即可,使用命令rz,即可實現windows上傳到linux;sz則反之,具體網上有資料~
3)是否需要手動配置三次?
答案:不需要!
怎麼做:先配置安裝主節點的hadoop,然後壓縮打包成xxx.tar.gz檔案,通過指令copy到從節點上:
scp xxx.tar.gz 使用者名稱@Slave1:~/
這份檔案即會出現在從節點的使用者根目錄下,解壓配置hadoop環境變數即可
4)最蛋疼的問題!!!:執行mapreduce時,hadoop卡在Running job上,即hadoop stuck at running job
題外話:這個問題卡了我快兩天的時間!各種google,參考了國內外各大神的帖子博文,最後還是得看hadoop的輸出日誌才對症下藥,finish掉這個問題
問題現象:
網上盜的圖,自己那個沒記錄下來。。。現象是一致的!
[email protected]:~$ $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount /myprg outputfile1
14/04/30 13:20:40 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/04/30 13:20:51 INFO input.FileInputFormat: Total input paths to process : 1
14/04/30 13:20:53 INFO mapreduce.JobSubmitter: number of splits:1
14/04/30 13:21:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1398885280814_0004
14/04/30 13:21:07 INFO impl.YarnClientImpl: Submitted application application_1398885280814_0004
14/04/30 13:21:09 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1398885280814_0004/
14/04/30 13:21:09 INFO mapreduce.Job: Running job: job_1398885280814_0004
然後開啟(http://master:8088)檢視Applications,發現剛提交的job一直卡在Accepted狀態,並沒有Running,等待許久也如此!
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
</configuration>
2)mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master:19888</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
</property>
</configuration>
然而,我是這麼做了,可是問題依舊!
二、節點/etc/hostname配置擺烏龍
於是乎,我查看了hadoop的輸出日誌,我查看了“yarn-使用者名稱-resourcemanager-主機名.log”,裡面的內容相當多………………
然後,我發現了問題!
2016-09-20 15:31:59,339 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Error trying to assign container token and NM token to an allocated container container_1474355964293_0002_01_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: 子節點hostname
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:258)
at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:455)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:269)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:988)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:988)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:981)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:806)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:803)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:784)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: 子節點hostname
這下問題就清楚了,如何解決?很簡單,配置每個節點的/etc/hostname(在ubuntu中),將Master、Slave1、Slave2分別配置在各自對應主機的/etc/hostname檔案第一行,如:
Slave1
NETWORKING=yes
HOSTNAME=Slave1
然後三個主機全部重啟!
激動人心的時候終於要到來啦!再次執行
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-example-2.7.3.jar wordcount /user/hadoop/input /user/hadoop/output
執行過程及結果:
[email protected]:~$ hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/hadoop/input /user/hadoop/output
16/09/21 12:54:57 INFO client.RMProxy: Connecting to ResourceManager at Master/10.100.3.88:8032
16/09/21 12:54:58 INFO input.FileInputFormat: Total input paths to process : 9
16/09/21 12:54:58 INFO mapreduce.JobSubmitter: number of splits:9
16/09/21 12:54:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474433568841_0001
16/09/21 12:54:59 INFO impl.YarnClientImpl: Submitted application application_1474433568841_0001
16/09/21 12:54:59 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1474433568841_0001/
16/09/21 12:54:59 INFO mapreduce.Job: Running job: job_1474433568841_0001
16/09/21 12:55:11 INFO mapreduce.Job: Job job_1474433568841_0001 running in uber mode : false
16/09/21 12:55:11 INFO mapreduce.Job: map 0% reduce 0%
16/09/21 12:55:56 INFO mapreduce.Job: map 33% reduce 0%
16/09/21 12:56:02 INFO mapreduce.Job: map 89% reduce 0%
16/09/21 12:56:03 INFO mapreduce.Job: map 100% reduce 0%
16/09/21 12:56:12 INFO mapreduce.Job: map 100% reduce 100%
16/09/21 12:56:13 INFO mapreduce.Job: Job job_1474433568841_0001 completed successfully
16/09/21 12:56:14 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=22561
FILE: Number of bytes written=1233835
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=29774
HDFS: Number of bytes written=11201
HDFS: Number of read operations=30
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=9
Launched reduce tasks=1
Data-local map tasks=9
Total time spent by all maps in occupied slots (ms)=885942
Total time spent by all reduces in occupied slots (ms)=24432
Total time spent by all map tasks (ms)=442971
Total time spent by all reduce tasks (ms)=6108
Total vcore-milliseconds taken by all map tasks=442971
Total vcore-milliseconds taken by all reduce tasks=6108
Total megabyte-milliseconds taken by all map tasks=1814409216
Total megabyte-milliseconds taken by all reduce tasks=50036736
Map-Reduce Framework
Map input records=825
Map output records=2920
Map output bytes=37672
Map output materialized bytes=22609
Input split bytes=987
Combine input records=2920
Combine output records=1281
Reduce input groups=622
Reduce shuffle bytes=22609
Reduce input records=1281
Reduce output records=622
Spilled Records=2562
Shuffled Maps =9
Failed Shuffles=0
Merged Map outputs=9
GC time elapsed (ms)=6441
CPU time spent (ms)=14050
Physical memory (bytes) snapshot=1877909504
Virtual memory (bytes) snapshot=53782507520
Total committed heap usage (bytes)=2054160384
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=28787
File Output Format Counters
Bytes Written=11201
再看看web頁面
希望對大家有一點點作用~