1. 程式人生 > >hadoop 2.2.0 執行MapReduce程式

hadoop 2.2.0 執行MapReduce程式

 

環境:

2臺虛擬機器搭建Hadoop環境

系統Fedora 10

Hadoop 2.2.0

 

準備工作:

1、Hadoop 2.2.0 環境配置執行

2、建立Hdfs的輸入資料夾和輸入檔案:

hadoop fs -copyFromLocal test1.txt /input/

hadoop fs -copyFromLocal test2.txt /input/

確認hdfs中沒有/output目錄,如有hadoop fs -rm -r /output刪除

執行:

cd ./hadoop-2.2.0/share/hadoop/mapreduce

shell上執行hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output

 

或者是自己建立WordCount.java檔案

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

編譯後建立wordcount.jar檔案(參考我的另一篇博文:http://blog.csdn.net/wendll/article/details/19568097);

執行hadoop jar wordcount.jar /input /output

 

錯誤排查:

執行時有如下錯誤

14/02/22 04:30:31 INFO client.RMProxy: Connecting to ResourceManager at cloud001/192.168.1.105:8032
14/02/22 04:30:33 INFO input.FileInputFormat: Total input paths to process : 3
14/02/22 04:30:33 INFO mapreduce.JobSubmitter: number of splits:3
14/02/22 04:30:33 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/22 04:30:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/22 04:30:33 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/22 04:30:33 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/02/22 04:30:33 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/02/22 04:30:33 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/22 04:30:33 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/02/22 04:30:33 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/22 04:30:33 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/22 04:30:33 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/22 04:30:33 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/22 04:30:33 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/22 04:30:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393055835807_0006
14/02/22 04:30:33 INFO impl.YarnClientImpl: Submitted application application_1393055835807_0006 to ResourceManager at cloud001/192.168.1.105:8032
14/02/22 04:30:33 INFO mapreduce.Job: The url to track the job: http://cloud001:8088/proxy/application_1393055835807_0006/
14/02/22 04:30:33 INFO mapreduce.Job: Running job: job_1393055835807_0006
14/02/22 04:30:54 INFO mapreduce.Job: Job job_1393055835807_0006 running in uber mode : false
14/02/22 04:30:55 INFO mapreduce.Job:  map 0% reduce 0%
14/02/22 04:30:55 INFO mapreduce.Job: Job job_1393055835807_0006 failed with state FAILED due to: Application application_1393055835807_0006 failed 2 times due to Error launching appattempt_1393055835807_0006_000002. Got exception: java.net.ConnectException: Call From localhost.localdomain/127.0.0.1 to localhost:58279 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy23.startContainers(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
        at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
        at org.apache.hadoop.ipc.Client.call(Client.java:1318)
        ... 9 more
. Failing the application.
14/02/22 04:30:55 INFO mapreduce.Job: Counters: 0


排查時,

修改/etc/hosts檔案
127.0.0.1 localhost localhost.localdomain
192.168.1.105 cloud001 192.168.1.106 cloud002


#127.0.0.1 localhost localhost.localdomain
192.168.1.105 cloud001 localhost localhost.localdomain
192.168.1.106 cloud002
但是在執行過程中一直報連線不上的錯誤,但最後output中有正確結果。

 

後網上多方查詢,發現是機子的hostname設定錯誤,hostname發現機子都是的hostname都為localhost.localdomain;

vim /etc/sysconfig/network中修改hostname為各自的主機名(如我配置的hadoop中namenode機器為cloud001,datanode為cloud002),

重啟系統後,啟動hadoop,執行hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output,程式執行正常:

14/02/24 03:40:02 INFO client.RMProxy: Connecting to ResourceManager at cloud001/192.168.1.105:8080
14/02/24 03:40:04 INFO input.FileInputFormat: Total input paths to process : 3
14/02/24 03:40:04 INFO mapreduce.JobSubmitter: number of splits:3
14/02/24 03:40:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/24 03:40:04 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/24 03:40:04 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/24 03:40:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393241524473_0001
14/02/24 03:40:05 INFO impl.YarnClientImpl: Submitted application application_1393241524473_0001 to ResourceManager at cloud001/192.168.1.105:8080
14/02/24 03:40:05 INFO mapreduce.Job: The url to track the job: http://cloud001:8088/proxy/application_1393241524473_0001/
14/02/24 03:40:05 INFO mapreduce.Job: Running job: job_1393241524473_0001
14/02/24 03:40:22 INFO mapreduce.Job: Job job_1393241524473_0001 running in uber mode : false
14/02/24 03:40:22 INFO mapreduce.Job:  map 0% reduce 0%
14/02/24 03:44:28 INFO mapreduce.Job:  map 56% reduce 0%
14/02/24 03:44:44 INFO mapreduce.Job:  map 100% reduce 0%
14/02/24 03:45:08 INFO mapreduce.Job:  map 100% reduce 100%
14/02/24 03:45:12 INFO mapreduce.Job: Job job_1393241524473_0001 completed successfully
14/02/24 03:45:14 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=4706
                FILE: Number of bytes written=326601
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=5112
                HDFS: Number of bytes written=1909
                HDFS: Number of read operations=12
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=3
                Launched reduce tasks=1
                Data-local map tasks=4
                Total time spent by all maps in occupied slots (ms)=776280
                Total time spent by all reduces in occupied slots (ms)=23900
        Map-Reduce Framework
                Map input records=136
                Map output records=358
                Map output bytes=5421
                Map output materialized bytes=4718
                Input split bytes=309
                Combine input records=358
                Combine output records=227
                Reduce input groups=112
                Reduce shuffle bytes=4718
                Reduce input records=227
                Reduce output records=112
                Spilled Records=454
                Shuffled Maps =3
                Failed Shuffles=0
                Merged Map outputs=3
                GC time elapsed (ms)=11839
                CPU time spent (ms)=7570
                Physical memory (bytes) snapshot=336568320
                Virtual memory (bytes) snapshot=1555247104
                Total committed heap usage (bytes)=346697728
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=4803
        File Output Format Counters 
                Bytes Written=1909