1. 程式人生 > >Hadoop Mapreduce之WordCount實現

Hadoop Mapreduce之WordCount實現

註意 com split gin 繼承 [] leo ring exce

1.新建一個WCMapper繼承Mapper public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //接收數據V1 String
line = value.toString(); //切分數據 String[] wordsStrings = line.split(" "); //循環 for (String w: wordsStrings) { //出現一次,記一個一,輸出 context.write(new Text(w), new LongWritable(1)); } } } 2.新建一個WCReducer繼承Reducer
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> v2s, Context context) throws IOException, InterruptedException { // TODO Auto-generated method stub
//接收數據 //Text k3 = k2; //定義一個計算器 long counter = 0; //循環v2s for (LongWritable i : v2s) { counter += i.get(); } //輸出 context.write(key, new LongWritable(counter)); } } 3.WordCount類實現Main方法 /* * 1.分析具體的業力邏輯,確定輸入輸出數據樣式 * 2.自定義一個類,這個類要繼承import org.apache.hadoop.mapreduce.Mapper; * 重寫map方法,實現具體業務邏輯,將新的kv輸出 * 3.自定義一個類,這個類要繼承import org.apache.hadoop.mapreduce.Reducer; * 重寫reduce,實現具體業務邏輯 * 4.將自定義的mapper和reducer通過job對象組裝起來 */ public class WordCount { public static void main(String[] args) throws Exception { // 構建Job對象 Job job = Job.getInstance(new Configuration()); // 註意:main方法所在的類 job.setJarByClass(WordCount.class); // 設置Mapper相關屬性 job.setMapperClass(WCMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, new Path("/words.txt")); // 設置Reducer相關屬性 job.setReducerClass(WCReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileOutputFormat.setOutputPath(job, new Path("/wcount619")); // 提交任務 job.waitForCompletion(true); } } 4.打包為wc.jar,並上傳到linux,並在Hadoop下運行 hadoop jar /root/wc.jar

Hadoop Mapreduce之WordCount實現