1. 程式人生 > >hadoop2.7.0實踐- WordCount

hadoop2.7.0實踐- WordCount

path static nts sdn 步驟 popu cer token apache

環境要求
說明:本文檔為wordcount的mapreduce job編寫及執行文檔。


操作系統:Ubuntu14 x64位
Hadoop:Hadoop 2.7.0
Hadoop官網:http://hadoop.apache.org/releases.html
MapReduce參照官網步驟:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Source_Code

本章基於前一篇文章《hadoop2.7.0實踐-環境搭建》。

1.安裝Eclipse
1)下載eclipse
官網:http://www.eclipse.org/
技術分享
2)解壓eclipse包

$tar -xvf eclipse-jee-mars-R-linux-gtk-x86_64.tar.gz

3)啟動eclipse
4)寫測試程序

public class TestMore {

    public static void main(String[] args) {
        System.out.println("hello world!");
        System.out.println("I‘m so glad to see that"
); } }

2.編寫wordcount
1)jar包引入
eclipse的lib中引入的jar包
hadoop包下的share/hadoop下的各個文件夾都有jar包
hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0.jar
hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.0.jar

2)編寫worcount程序
相應源代碼

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache
.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }

3)導出jar包
取名wc.jar,直接導出到hadoop文件夾下。


技術分享
3.執行wordcount
1)啟動dfs服務
參照文件《hadoop2.7.0實踐-環境搭建》。
進入hadoop文件夾,用cd命令。

$sbin/start-dfs.sh

相應查看網頁:http://localhost:50070/
2)準備文件
hadoop-2.7.0/wctest/input文件夾中放入待統計文件file01
輸入內容:hello world bye world

//創建hdfs文件夾。操作命令相似本地操作

$ bin/hdfs fs -mkdir /user
$ bin/hdfs fs -mkdir /user/a

//復制本地文件到hdfs中

$ bin/hdfs fs -put wctest/input /user/a/input

//備註:相應文件夾刪除命令例如以下

delete dir:bin/hadoop fs -rm -f -r /user/a/input

相應文件http://localhost:50070/
3)啟動yarn服務

$ sbin/start-yarn.sh

4)執行wordcount程序

$ bin/hadoop jar wc.jar WordCount /user/a/input /user/a/output

5)查看結果

$ bin/hadoop fs -cat /user/a/output/part-r-00000
bye 1
hello   1
world   2

常見錯誤及說明
1)未啟動yarn時執行MapReduce程序
技術分享
原因:已經配置了yarn,但沒有啟動引起的
調整:啟動一下yarn

$ sbin/start-yarn.sh

hadoop2.7.0實踐- WordCount