hadoop 2.2.0 執行MapReduce程式
環境:
2臺虛擬機器搭建Hadoop環境
系統Fedora 10
Hadoop 2.2.0
準備工作:
1、Hadoop 2.2.0 環境配置執行
2、建立Hdfs的輸入資料夾和輸入檔案:
hadoop fs -copyFromLocal test1.txt /input/
hadoop fs -copyFromLocal test2.txt /input/
確認hdfs中沒有/output目錄,如有hadoop fs -rm -r /output刪除
執行:
cd ./hadoop-2.2.0/share/hadoop/mapreduce
shell上執行hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
或者是自己建立WordCount.java檔案
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
編譯後建立wordcount.jar檔案(參考我的另一篇博文:http://blog.csdn.net/wendll/article/details/19568097);
執行hadoop jar wordcount.jar /input /output
錯誤排查:
執行時有如下錯誤
14/02/22 04:30:31 INFO client.RMProxy: Connecting to ResourceManager at cloud001/192.168.1.105:8032 14/02/22 04:30:33 INFO input.FileInputFormat: Total input paths to process : 3 14/02/22 04:30:33 INFO mapreduce.JobSubmitter: number of splits:3 14/02/22 04:30:33 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 14/02/22 04:30:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/02/22 04:30:33 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/02/22 04:30:33 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class 14/02/22 04:30:33 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/02/22 04:30:33 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/02/22 04:30:33 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 14/02/22 04:30:33 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/02/22 04:30:33 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/02/22 04:30:33 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/02/22 04:30:33 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/02/22 04:30:33 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/02/22 04:30:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393055835807_0006 14/02/22 04:30:33 INFO impl.YarnClientImpl: Submitted application application_1393055835807_0006 to ResourceManager at cloud001/192.168.1.105:8032 14/02/22 04:30:33 INFO mapreduce.Job: The url to track the job: http://cloud001:8088/proxy/application_1393055835807_0006/ 14/02/22 04:30:33 INFO mapreduce.Job: Running job: job_1393055835807_0006 14/02/22 04:30:54 INFO mapreduce.Job: Job job_1393055835807_0006 running in uber mode : false 14/02/22 04:30:55 INFO mapreduce.Job: map 0% reduce 0% 14/02/22 04:30:55 INFO mapreduce.Job: Job job_1393055835807_0006 failed with state FAILED due to: Application application_1393055835807_0006 failed 2 times due to Error launching appattempt_1393055835807_0006_000002. Got exception: java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to localhost:58279 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy23.startContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642) at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399) at org.apache.hadoop.ipc.Client.call(Client.java:1318) ... 9 more . Failing the application. 14/02/22 04:30:55 INFO mapreduce.Job: Counters: 0
排查時,
修改/etc/hosts檔案
127.0.0.1 localhost localhost.localdomain
192.168.1.105 cloud001 192.168.1.106 cloud002
為
#127.0.0.1 localhost localhost.localdomain
192.168.1.105 cloud001 localhost localhost.localdomain
192.168.1.106 cloud002
但是在執行過程中一直報連線不上的錯誤,但最後output中有正確結果。
後網上多方查詢,發現是機子的hostname設定錯誤,hostname發現機子都是的hostname都為localhost.localdomain;
vim /etc/sysconfig/network中修改hostname為各自的主機名(如我配置的hadoop中namenode機器為cloud001,datanode為cloud002),
重啟系統後,啟動hadoop,執行hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output,程式執行正常:
14/02/24 03:40:02 INFO client.RMProxy: Connecting to ResourceManager at cloud001/192.168.1.105:8080
14/02/24 03:40:04 INFO input.FileInputFormat: Total input paths to process : 3
14/02/24 03:40:04 INFO mapreduce.JobSubmitter: number of splits:3
14/02/24 03:40:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/24 03:40:04 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/24 03:40:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393241524473_0001
14/02/24 03:40:05 INFO impl.YarnClientImpl: Submitted application application_1393241524473_0001 to ResourceManager at cloud001/192.168.1.105:8080
14/02/24 03:40:05 INFO mapreduce.Job: The url to track the job: http://cloud001:8088/proxy/application_1393241524473_0001/
14/02/24 03:40:05 INFO mapreduce.Job: Running job: job_1393241524473_0001
14/02/24 03:40:22 INFO mapreduce.Job: Job job_1393241524473_0001 running in uber mode : false
14/02/24 03:40:22 INFO mapreduce.Job: map 0% reduce 0%
14/02/24 03:44:28 INFO mapreduce.Job: map 56% reduce 0%
14/02/24 03:44:44 INFO mapreduce.Job: map 100% reduce 0%
14/02/24 03:45:08 INFO mapreduce.Job: map 100% reduce 100%
14/02/24 03:45:12 INFO mapreduce.Job: Job job_1393241524473_0001 completed successfully
14/02/24 03:45:14 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=4706
FILE: Number of bytes written=326601
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5112
HDFS: Number of bytes written=1909
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=776280
Total time spent by all reduces in occupied slots (ms)=23900
Map-Reduce Framework
Map input records=136
Map output records=358
Map output bytes=5421
Map output materialized bytes=4718
Input split bytes=309
Combine input records=358
Combine output records=227
Reduce input groups=112
Reduce shuffle bytes=4718
Reduce input records=227
Reduce output records=112
Spilled Records=454
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=11839
CPU time spent (ms)=7570
Physical memory (bytes) snapshot=336568320
Virtual memory (bytes) snapshot=1555247104
Total committed heap usage (bytes)=346697728
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=4803
File Output Format Counters
Bytes Written=1909