1. 程式人生 > >《Hadoop權威指南》學習筆記(三)

《Hadoop權威指南》學習筆記(三)

本博文是我學習《Hadoop權威指南》第5章的筆記,主要是裡面範例程式的實現,部分實現有修改

1 Mapper測試

需要使用mrunit這個jar包,在pom.xml新增dependency的時候,要新增classifier屬性不然下載不了jar包,根據自己hadoop-core的版本來確定

<dependency>
    <groupId>org.apache.mrunit</groupId>
	<artifactId>mrunit</artifactId>
	<version>1.1.0</version>
	<classifier>hadoop2</classifier>
	<scope>test</scope>
</dependency>

編寫測試類,測試,一切從簡,你也可以嚴格按照書上的來,注意引用MapDriver的時候有兩個引用,一個是mapreduce一個是mapred,根據自己的Mapper類是哪個版本來,mapred是老版本

package com.tuan.hadoopLearn.io.com.tuan.hadoopLearn.mapreduce;

import com.tuan.hadoopLearn.mapreduce.MaxTemperatureMapper;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.jupiter.api.Test;

import java.io.IOException;

public class MaxTemperatureTest {
    @Test
    public void mapperTest() {
        Text input = new Text("1993 38");
        try {
            new MapDriver<LongWritable, Text, Text, IntWritable>()
                    .withMapper(new MaxTemperatureMapper())
                    .withInput(new LongWritable(), input)
                    .withOutput(new Text("1993"), new IntWritable(38))
                    .runTest();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

2 Reducer測試

在上面的類裡面再寫一個Reducer測試

    @Test
    public void reducerTest() {
        try {
            new ReduceDriver<Text, IntWritable, Text, IntWritable>()
                    .withReducer(new MaxTemperatureReducer())
                    .withInput(new Pair<>(new Text("1993"), Arrays.asList(new IntWritable(10), new IntWritable(5))))
                    .withOutput(new Text("1993"), new IntWritable(10))
                    .runTest();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

3 作業除錯

例如,在處理最高氣溫的程式中,插入計數器以檢測過大的異常輸入,在Mapper類中插入幾行程式碼,注意這裡書上有一行程式碼的括號有誤,我還奇怪列舉項怎麼increment

package com.tuan.hadoopLearn.mapreduce;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    private static final int MISSING = 9999;

    enum Temperature {
        OVER_100
    }

    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] line = value.toString().split(" ");
        int temperature = Integer.parseInt(line[1]);
        if (temperature > 100) {
            context.setStatus("Detected possible corrupt input");
            context.getCounter(Temperature.OVER_100).increment(1);  //這裡書上有錯
        }
        context.write(new Text(line[0]), new IntWritable(temperature));
    }
}

把input.txt後面加一條“1992 520”的異常記錄,執行一下這個MapReduce程式,還是熟悉的命令

hadoop jar hadoopLearn-0.0.1-SNAPSHOT.jar com.tuan.hadoopLearn.mapreduce.MaxTemperature /mapreduce/input.txt /mapreduce/output

在作業結束後,可以看到定義的OVER_100計數器的計數值為2,證明有兩個超過了100的異常輸入

在web端檢視一下historyserver,從下圖這個紅框的地方點進去,到了task介面找到mapper繼續點

 最後來到一個介面,可以看到Status已經變成了檢測到異常輸入

還可以檢視Counter 

 4 效能調優

用Java提供的Hprof工具獲取執行過程中的效能引數

重新寫一個MaxTemperatureDriver,比之前的MaxTemperature多了一些Hprof的配置語句。一開始我的profile.out檔案除了說明資訊其他都是空的,最後發現是"mapreduce.task.profile.params"寫成了"mapreduce.task,profile.params",也是醉了

package com.tuan.hadoopLearn.mapreduce;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MaxTemperatureDriver extends Configured implements Tool {
    @Override
    public int run(String[] strings) throws Exception {
        if (strings.length != 2) {
            System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
            ToolRunner.printGenericCommandUsage(System.err);
            return -1;
        }

        Configuration conf = getConf();
        conf.setBoolean("mapreduce.task.profile", true);  //啟用分析工具
        conf.set("mapreduce.task.profile.params", "-agentlib:hprof=cpu=samples,heap=sites,depth=6," +
                "force=n,thread=y,verbose=n,file=%s");  //JVM的分析引數配置
        conf.set("mapreduce.task.profile.maps", "0-2");  //分析的map任務id範圍
        conf.set("mapreduce.task.profile.reduces", "0-2");  //分析的reduce任務id範圍

        Job job = new Job(conf, "Max Temperature");
        job.setJarByClass(getClass());

        FileInputFormat.addInputPath(job, new Path(strings[0]));
        FileOutputFormat.setOutputPath(job, new Path(strings[1]));

        job.setMapperClass(MaxTemperatureMapper.class);
        job.setReducerClass(MaxTemperatureReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        System.exit(ToolRunner.run(new MaxTemperatureDriver(), args));
    }
}

用熟悉的語句執行

hadoop jar hadoopLearn-0.0.1-SNAPSHOT.jar com.tua
n.hadoopLearn.mapreduce.MaxTemperatureDriver /mapreduce/input.txt /mapreduce/output

 進Web端,如下地方點選檢視profile.out檔案 

 

然後選擇最下面的userlogs,點選自己的應用,層層目錄下最終找到profile.out檔案,檔案很長,最後一段是統計了每個方法呼叫比例