win7 系統eclipse環境下測試 執行hadoop 的 wordcount mapreduce。
阿新 • • 發佈:2018-11-29
上篇介紹了在linux下測試執行 hadoop 的wordcount 例子後,就想著怎麼在eclipse 下編寫mapreduce函式,連結hadoop叢集計算呢。
linux下測試執行 hadoop 的wordcount 參考:https://mp.csdn.net/mdeditor/84143774#
linux 部署hadoop 叢集參考:https://mp.csdn.net/mdeditor/84073712#
1 下載eclipse的hadoop外掛 hadoop2x-eclipse-plugin-2.6.0
https://download.csdn.net/download/qq_22830285/10792412
下載之後解壓,將relase 目錄下的hadoop-eclipse-plugin-2.6.0.jar 複製到eclipse的 plugin 目錄下、
2.重起執行eclipse,開啟選單Window->ShowView->Other,顯示如
3、new hadoop lacation 配置elcipse 與hadoop 的連結。填完之後,點選finish.
/4、儲存完配置之後,可以看到project explorer ,新增了一個 DFS location.
/5、如果出現下面錯誤的話,在系統環境變數新增HADOOP_USER_NAME=root環境變數,或者win系統的使用者名稱改為root,又或者修改hadoop 的hdfs-site檔案中新增以下內容,關閉許可權檢查 ,即解決了上述問題。
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
/6、新建立專案map/reduce 專案
建立 map 函式,WordCountMap 類如下
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer token = new StringTokenizer(line); while (token.hasMoreTokens()) { word.set(token.nextToken()); context.write(word, one); } } }
建立reduce 函式,WordCountReduce
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
建立wordcount mian ,WordCountTest
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCountTest {
@SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception{
// Configuration conf = new Configuration();
Job job = new Job();
job.setJarByClass(WordCountTest.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordCountMap.class);
job.setReducerClass(WordCountReduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
//輸入檔案路徑,我的a.txt,b.txt 檔案放在hdfs系統下的 user/root/input 目錄下
FileInputFormat.addInputPath(job, new Path("hdfs://192.168.80.130:9000/user/root/input"));
//計算結果輸出檔案路徑,記住,此路徑不能存在,否則會報錯
FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.80.130:9000/user/root/out3"));
job.waitForCompletion(true)
}
}
然後執行,WordCountTest 類。
run as -->run on hadoop
/7、也許,看不到控制檯的日誌,那麼將hadoop 的log4j.properties檔案複製工程src 目錄下。
/8.執行之後DFS相應的目錄下看到輸出結果。