1. 程式人生 > >學習筆記:從0開始學習大資料-4.Eclipse配置hadoop開發環境

學習筆記:從0開始學習大資料-4.Eclipse配置hadoop開發環境

Eclipse配置hadoop開發環境

1. 下載   hadoop-eclipse-plugin-2.6.0.jar

https://github.com/winghc/hadoop2x-eclipse-plugin/tree/v2.6.0

2. 複製下載的 hadoop-eclipse-plugin-2.6.0.jar檔案到 eclipse的plugins目錄

3.重啟eclipse

點選新建-》專案,可以看見Map/Reduce Project

4. 建立Map/Reduce Project專案測試

新建一個 wordcount專案,再新建一個WorkCount類,直接複製hadoop安裝帶的example的workcount原始碼

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
	  public static class TokenizerMapper 
      extends Mapper<Object, Text, Text, IntWritable>{
   
   private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();
     
   public void map(Object key, Text value, Context context
                   ) throws IOException, InterruptedException {
     StringTokenizer itr = new StringTokenizer(value.toString());
     while (itr.hasMoreTokens()) {
       word.set(itr.nextToken());
       context.write(word, one);
     }
   }
 }
 
 public static class IntSumReducer 
      extends Reducer<Text,IntWritable,Text,IntWritable> {
   private IntWritable result = new IntWritable();

   public void reduce(Text key, Iterable<IntWritable> values, 
                      Context context
                      ) throws IOException, InterruptedException {
     int sum = 0;
     for (IntWritable val : values) {
       sum += val.get();
     }
     result.set(sum);
     context.write(key, result);
   }
 }

 public static void main(String[] args) throws Exception {
   Configuration conf = new Configuration();
   String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
   if (otherArgs.length < 2) {
     System.err.println("Usage: wordcount <in> [<in>...] <out>");
     System.exit(2);
    }
   Job job = Job.getInstance(conf, "word count");
   job.setJarByClass(WordCount.class);
   job.setMapperClass(TokenizerMapper.class);
   job.setCombinerClass(IntSumReducer.class);
   job.setReducerClass(IntSumReducer.class);
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
   for (int i = 0; i < otherArgs.length - 1; ++i) {
     FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
   }
   FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
   System.exit(job.waitForCompletion(true) ? 0 : 1);
 }
}

5. 匯出jar檔案

直接點選“檔案-》匯出”

匯出WordCount.jar

6.執行測試

hadoop fs -put hello.txt  /user/root        //上傳測試需統計單詞的檔案

hadoop jar WordCount.jar  WordCount  /user/root/hello.txt   /user/root/wcout    //執行測試單詞統計作業

hadoop fs -ls /user/root/wcount    //檢視輸出結果目錄

hadoop fs -text /user/root/wcount/part*      // 檢視統計果

也可以通過 http://centos7:8088/cluster/apps  檢視作業排程執行資訊

接下來可以參考wordcount設計自己的統計作業程式