MapReduce初級案例（3）：使用MapReduce實現平均成績

阿新 • • 發佈：2019-02-19

當我們看到這個例子的時候，我們是否想過：
mapreduce是否可以完成我們傳統開發中經常遇到的一些任務。例如排序、平均數、批量word轉換等。它和我們傳統開發有什麼不同。
那麼我們可以帶著下面問題來閱讀：
1.mapreduce是如何求平均值的？
2.map在求平均值的作用是什麼？
3.reduce在求平均值的作用是什麼？

一、簡介：
"平均成績"主要目的還是在重溫經典"WordCount"例子，可以說是在基礎上的微變化版，該例項主要就是實現一個計算學生平均成績的例子。

二、例項描述

對輸入檔案中資料進行就算學生平均成績。輸入檔案中的每行內容均為一個學生的姓名和他相應的成績，如果有多門學科，則每門學科為一個檔案。要求在輸出中每行有兩個間隔的資料，其中，第一個代表學生的姓名，第二個代表其平均成績。

樣本輸入：

1）math：

張三 88

李四 99
王五 66
趙六 77

複製程式碼 2）chinese ：

張三 78
李四 89
王五 96
趙六 67

複製程式碼 3）english：

張三 80
李四 82
王五 84
趙六 86

複製程式碼 樣本輸出：

張三 82
李四 90
王五 82
趙六 76

複製程式碼
三、設計思路

計算學生平均成績是一個仿"WordCount"例子，用來重溫一下開發MapReduce程式的流程。程式包括兩部分的內容：Map部分和Reduce部分，分別實現了map和reduce的功能。

Map處理的是一個純文字檔案，檔案中存放的資料時每一行表示一個學生的姓名和他相應一科成績。Mapper處理的資料是由InputFormat分解過的資料集，其中InputFormat的作用是將資料集切割成小資料集InputSplit，每一個InputSlit將由一個Mapper負責處理。此外，InputFormat中還提供了一個RecordReader的實現，並將一個InputSplit解析成<key,value>對提供給了map函式。InputFormat的預設值是TextInputFormat，它針對文字檔案，按行將文字切割成InputSlit，並用LineRecordReader將InputSplit解析成<key,value>對，key是行在文字中的位置，value是檔案中的一行。

Map的結果會通過partion分發到Reducer，Reducer做完Reduce操作後，將通過以格式OutputFormat輸出。

Mapper最終處理的結果對<key,value>，會送到Reducer中進行合併，合併的時候，有相同key的鍵/值對則送到同一個Reducer上。Reducer是所有使用者定製Reducer類地基礎，它的輸入是key和這個key對應的所有value的一個迭代器，同時還有Reducer的上下文。Reduce的結果由Reducer.Context的write方法輸出到檔案中。

四、程式程式碼

程式程式碼如下所示：

package com.hebut.mr;
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class Score {
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
// 實現map函式
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// 將輸入的純文字檔案的資料轉化成String
String line = value.toString();
// 將輸入的資料首先按行進行分割
StringTokenizer tokenizerArticle = new StringTokenizer(line, "\n");
// 分別對每一行進行處理
while (tokenizerArticle.hasMoreElements()) {
// 每行按空格劃分
StringTokenizer tokenizerLine = new StringTokenizer(tokenizerArticle.nextToken());
String strName = tokenizerLine.nextToken();// 學生姓名部分
String strScore = tokenizerLine.nextToken();// 成績部分
Text name = new Text(strName);
int scoreInt = Integer.parseInt(strScore);
// 輸出姓名和成績
context.write(name, new IntWritable(scoreInt));
}
}
}
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
// 實現reduce函式
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
int count = 0;
Iterator<IntWritable> iterator = values.iterator();
while (iterator.hasNext()) {
sum += iterator.next().get();// 計算總分
count++;// 統計總的科目數
}
int average = (int) sum / count;// 計算平均成績
context.write(key, new IntWritable(average));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// 這句話很關鍵
conf.set("mapred.job.tracker", "192.168.1.2:9001");
String[] ioArgs = new String[] { "score_in", "score_out" };
String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Score Average <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "Score Average");
job.setJarByClass(Score.class);
// 設定Map、Combine和Reduce處理類
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
// 設定輸出型別
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 將輸入的資料集分割成小資料塊splites，提供一個RecordReder的實現
job.setInputFormatClass(TextInputFormat.class);
// 提供一個RecordWriter的實現，負責資料輸出
job.setOutputFormatClass(TextOutputFormat.class);
// 設定輸入和輸出目錄
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

複製程式碼 四、程式碼結果

1）準備測試資料通過Eclipse下面的"DFS Locations"在"/user/hadoop"目錄下建立輸入檔案"score_in"資料夾（備註："score_out"不需要建立。）如圖3.4-1所示，已經成功建立。

1.png (43.71 KB, 下載次數: 5)

下載附件儲存到相簿

2014-3-3 22:24 上傳

2.png (55.61 KB, 下載次數: 5)

下載附件儲存到相簿

2014-3-3 22:24 上傳

圖3.4-1 建立"score_in" 圖3.4.2 上傳三門分數
然後在本地建立三個txt檔案，通過Eclipse上傳到"/user/hadoop/score_in"資料夾中，三個txt檔案的內容如"例項描述"那三個檔案一樣。如圖3.4-2所示，成功上傳之後。備註：文字檔案的編碼為"UTF-8"，預設為"ANSI"，可以另存為時選擇，不然中文會出現亂碼。從SecureCRT遠處檢視"Master.Hadoop"的也能證實我們上傳的三個檔案。

3.png (64.21 KB, 下載次數: 5)

下載附件儲存到相簿

2014-3-3 22:25 上傳

檢視三個檔案的內容如圖3.4-3所示：

4.png (60.06 KB, 下載次數: 5)

下載附件儲存到相簿

2014-3-3 22:25 上傳

圖3.4.3 三門成績的內容 2）檢視執行結果這時我們右擊Eclipse的"DFS Locations"中"/user/hadoop"資料夾進行重新整理，這時會發現多出一個"score_out"資料夾，且裡面有3個檔案，然後開啟雙其"part-r-00000"檔案，會在Eclipse中間把內容顯示出來。如圖3.4-4所示。

5.png (73.91 KB, 下載次數: 5)

下載附件儲存到相簿

2014-3-3 22:25 上傳

圖3.4-4 執行結果

MapReduce初級案例（3）：使用MapReduce實現平均成績

MapReduce初級案例（3）：使用MapReduce實現平均成績

MapReduce框架學習（3）——Job的建立及配置

MapReduce學習筆記（3） ——輸入格式

10天Hadoop快速突擊（3）——開發MapReduce應用程式

Pro Android學習筆記（一三七）：Home Screen Widgets（3）：配置Activity

Windows Phone開發（3）：棋子未動，先觀全局

Akka（3）： Actor監管 - 細述BackoffSupervisor

springCloud（3）：微服務的註冊與發現（Eureka）

[Golang] 從零開始寫Socket Server（3）：對長、短連接的處理策略（模擬心跳）

python基礎（3）：輸入輸出與運算符

PYTHON設計模式學習（3）：Singleton pattern

【H.264/AVC視頻編解碼技術具體解釋】十三、熵編碼算法（3）：CAVLC原理

tcpdump 學習（3）：MySQL Query

（譯）Web是如何工作的（3）：HTTP&REST

Linux日常管理技巧（3）：Linux網絡相關和防火墻

SDP（3）：ScalikeJDBC- JDBC-Engine：Fetching

web前端學習（3）：認識HTML基本標簽

Python3網絡爬蟲（3）：使用User Agent和代理IP隱藏身份

企業案例（二）：增量恢復案例

grpc（3）：使用 golang 開發 grpc 服務端和client

MapReduce初級案例（3）：使用MapReduce實現平均成績

相關推薦