1. 程式人生 > >Hadoop-使用MRUnit來寫單元測試

Hadoop-使用MRUnit來寫單元測試

簡介

單元測試是用來對一個模組、一個函式或者一個類來進行正確性檢驗的測試工作。在MapReduce開發中,如果能對Mapper和Reducer進行詳盡的單元測試,將及早發現問題,加快開發進度。 本文結合具體的例子,簡單總結如何使用MRUnit來對Hadoop的Mapper和Reducer進行單元測試。本文的相關程式碼可以從Github獲取:https://github.com/liujinguang/hadoop-study.git

MRUnit介紹

在MapReduce中,map函式和reduce函式的獨立測試非常方便,這是由函式風格決定的。MRUnit(http://incubator.apache.org/mrunit/)是一個測試庫,它便於將已知的輸入傳遞給mapper或者檢查reducer的輸出是否符合預期。MRUnit與標準的執行框架(如JUnit)-起使用,因此可以將MapReduce作業的測試作為正常開發環境的一部分執行。

關於Mapper

MaxTemperatureMapper類實現了對固定格式字串中解析年份、溫度和空氣質量,在後面的MRUnit測試中,給出了字串的例子,可以參考。

package com.jliu.mr.intro;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		String line = value.toString();
		String year = line.substring(15, 19);
		int airTemperature;

		if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
										// signs
			airTemperature = Integer.parseInt(line.substring(88, 92));
		} else {
			airTemperature = Integer.parseInt(line.substring(87, 92));
		}

		String quality = line.substring(92, 93);
		if (airTemperature != MISSING && quality.matches("[01459]")) {
			context.write(new Text(year), new IntWritable(airTemperature));
		}
	}

	private static final int MISSING = 9999;
}

使用MRUnit進行測試,首先需要建立MapDriver物件,並設定要測試的Mapper類,設定輸入、期望輸出。具體例子中傳遞一個天氣記錄作為mapper的輸入,然後檢查輸出是否是讀入的年份和氣溫。如果沒有期望的輸出值,MRUnit測試失敗。

package com.jliu.mr.mrunit;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counters;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Test;

import com.jliu.mr.intro.MaxTemperatureMapper;

public class MaxTemperatureMapperTest {
	@Test
	public void testParsesValidRecord() throws IOException {
		Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
		// ++++++++++++++++++++++++++++++year ^^^^
				"99999V0203201N00261220001CN9999999N9-00111+99999999999");
		// ++++++++++++++++++++++++++++++temperature ^^^^^
		// 由於測試的mapper,所以適用MRUnit的MapDriver
		new MapDriver<LongWritable, Text, Text, IntWritable>()
				// 配置mapper
				.withMapper(new MaxTemperatureMapper())
				// 設定輸入值
				.withInput(new LongWritable(0), value)
				// 設定期望輸出:key和value
				.withOutput(new Text("1950"), new IntWritable(-11)).runTest();
	}

	@Test
	public void testParseMissingTemperature() throws IOException {
		// 根據withOutput()被呼叫的次數, MapDriver能用來檢查0、1或多個輸出記錄。
		// 在這個測試中由於缺失的溫度記錄已經被過濾,保證對這種特定輸入不產生任何輸出
		Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
		// ++++++++++++++++++++++++++++++Year ^^^^
				"99999V0203201N00261220001CN9999999N9+99991+99999999999");
		// ++++++++++++++++++++++++++++++Temperature ^^^^^
		new MapDriver<LongWritable, Text, Text, IntWritable>()
				.withMapper(new MaxTemperatureMapper())
				.withInput(new LongWritable(0), value)
				.runTest();
	}
}

關於Reducer

結合上面的Mapper,reducer必須找出指定鍵的最大值。

package com.jliu.mr.intro;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,
			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {

		int maxValue = Integer.MIN_VALUE;
		for (IntWritable value : values) {
			maxValue = Math.max(maxValue, value.get());
		}

		context.write(key, new IntWritable(maxValue));
	}
}
對Reducer的測試,與Mapper類似,參考下面的具體測試例:
package com.jliu.mr.mrunit;

import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import java.io.IOException;
import java.util.Arrays;
import org.apache.hadoop.io.*;
import org.junit.Test;

import com.jliu.mr.intro.MaxTemperatureReducer;

public class MaxTemperatureReducerTest {
	@Test
	public void testRetrunsMaximumIntegerValues() throws IOException {
		new ReduceDriver<Text, IntWritable, Text, IntWritable>()
		//設定Reducer
		.withReducer(new MaxTemperatureReducer())
		//設定輸入key和List
		.withInput(new Text("1950"),  Arrays.asList(new IntWritable(10), new IntWritable(5)))
		//設定期望輸出
		.withOutput(new Text("1950"), new IntWritable(10))
		//執行測試
		.runTest();
	}
}

總結

通過MRUnit框架對MapReduce測試比較簡單,配合JUnit,建立MapperDriver或ReduceDriver物件,設定需要測試的類,設定輸入和期望的輸出,通過runTest()來執行測試例。

參考資料

1.  Hadoop權威指南 第3版