IntelliJ IDEA 遠端除錯Hadoop
開發環境 IntelliJ IDEA 2017.1.3
JDK VERSION jdk 1.8
Hadoop 版本 hadoop1.0.0
虛擬機器 完全分散式
node1172.16.20.101 master
node2172.16.20.102 slave1
node3172.16.20.103 slave2
由於當前關於Hadoop2.x的書籍國內還是很少 所以本人從hadoop1.x 開始入門 推薦書籍 《Hadoop 實戰》《Hadoop 權威指南》
Eclipse 網上有很多DFS外掛 開發起來比較容易 但是IDEA上的外掛比較少 本文主要討論如何在IDEA上遠端除錯Hadoop
一 、 Maven 構建 Hadoop開發環境
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.0.0</version> </dependency>
二、新增配置檔案
直接從Master的 $HADOOP_HOME/conf 下拷貝
core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/hadoop/tmp</value><description>A base for other temporary directories. </description> </property> <!-- file system properties --> <property> <name>fs.default.name</name> <value>hdfs://172.16.20.101:9000</value> </property> </configuration>
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>http://172.16.20.101:9001</value> </property> </configuration>
由於Hadoop在執行下面的程式碼時會自動載入classpath中shang's配置檔案
Configuration conf = new Configuration();
三、執行WordCount 例項
Hadoop example.jar 裡面直接拷貝程式碼並進行改寫
package com.hadoop.wordcount; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; import java.io.IOException; import java.util.StringTokenizer; /** * Created by nanzhou on 2017/9/13. */ public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private static final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { this.word.set(itr.nextToken()); context.write(this.word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } this.result.set(sum); context.write(key, this.result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] ioArgs = new String[]{"/user/hadoop/input", "/user/hadoop/output"}; String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } JobConf jobConf = new JobConf(); jobConf.setJar("/Applications/file/work/JavaProject/hadoopbasic/target/hadoop-basic-1.0-SNAPSHOT.jar"); Job job = new Job(jobConf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
NOTICE:
(1) 很多在本地搭建執行Hadoop會出現許可權的問題 解決方法有兩種
<1> 將自己的使用者名稱改為 Hadoop上的使用者名稱 列如hadoop
<2> 更改配置hdfs-site.xml
<property> <name>dfs.permissions</name> <value>true</value> </property>
(2) 執行MapReduce時 會出現 Map以及Reduce class not found的情況
需要程式碼上 加上 JobConf 配置 指定本地Jar包的地址 就可以實現Eclipse外掛 Run on hadoop 的作用
執行時需要利用Maven 重新 install 工程
本文原始碼地址 https://github.com/stupidcupid/hadoop-1.x