1. 程式人生 > >IntelliJ IDEA 遠端除錯Hadoop

IntelliJ IDEA 遠端除錯Hadoop

開發環境   IntelliJ IDEA 2017.1.3 

JDK VERSION  jdk 1.8

Hadoop 版本 hadoop1.0.0

虛擬機器 完全分散式 

node1172.16.20.101  master 

node2172.16.20.102  slave1

node3172.16.20.103  slave2 


由於當前關於Hadoop2.x的書籍國內還是很少  所以本人從hadoop1.x 開始入門 推薦書籍 《Hadoop 實戰》《Hadoop 權威指南》


Eclipse 網上有很多DFS外掛 開發起來比較容易 但是IDEA上的外掛比較少 本文主要討論如何在IDEA上遠端除錯Hadoop  

一 、 Maven 構建 Hadoop開發環境 

<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core -->
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>1.0.0</version>
</dependency>

二、新增配置檔案

直接從Master的 $HADOOP_HOME/conf 下拷貝 


  core-site.xml 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/hadoop/tmp</value>
<description>A base for other temporary directories. </description> </property> <!-- file system properties --> <property> <name>fs.default.name</name> <value>hdfs://172.16.20.101:9000</value> </property> </configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>http://172.16.20.101:9001</value>
    </property>
</configuration>

由於Hadoop在執行下面的程式碼時會自動載入classpath中shang's配置檔案 

Configuration conf = new Configuration();

三、執行WordCount 例項  

Hadoop example.jar 裡面直接拷貝程式碼並進行改寫

package com.hadoop.wordcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import java.io.IOException;
import java.util.StringTokenizer;
/**
 * Created by nanzhou on 2017/9/13.
 */
public class WordCount {

    public static class TokenizerMapper
            extends Mapper<Object, Text, Text, IntWritable> {
        private static final IntWritable one = new IntWritable(1);
        private Text word = new Text();
        public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                this.word.set(itr.nextToken());
context.write(this.word, one);
}
        }
    }

    public static class IntSumReducer
            extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();
        public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
}
            this.result.set(sum);
context.write(key, this.result);
}
    }

    public static void main(String[] args)
            throws Exception {
        Configuration conf = new Configuration();
String[] ioArgs = new String[]{"/user/hadoop/input", "/user/hadoop/output"};
String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
        JobConf jobConf = new JobConf();
jobConf.setJar("/Applications/file/work/JavaProject/hadoopbasic/target/hadoop-basic-1.0-SNAPSHOT.jar");
Job job = new Job(jobConf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

NOTICE: 

(1) 很多在本地搭建執行Hadoop會出現許可權的問題 解決方法有兩種

         <1> 將自己的使用者名稱改為 Hadoop上的使用者名稱  列如hadoop

         <2> 更改配置hdfs-site.xml   

<property> 
<name>dfs.permissions</name> 
<value>true</value> 
</property>

 (2) 執行MapReduce時 會出現 Map以及Reduce class not found的情況 

   需要程式碼上 加上 JobConf 配置  指定本地Jar包的地址 就可以實現Eclipse外掛 Run on hadoop 的作用 

   執行時需要利用Maven 重新 install 工程 

 本文原始碼地址   https://github.com/stupidcupid/hadoop-1.x