MapReducer中原始碼Mapper和Reducer方法原始碼解析

阿新 • • 發佈：2018-11-10

原始碼中Mapper類中的方法

	/**
	   * The <code>Context</code> passed on to the {@link Mapper} implementations.
	   */
	  public abstract class Context
	    implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
	  }

上下文map結束後,向reduce或者下一個階段寫資料時候

/**
   * Called once at the beginning of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

任務開始的時候被呼叫一次

/**
   * Called once for each key/value pair in the input split. Most applications
   * 對於輸入分割中的每個鍵/值對呼叫一次。所有的應用程式
   * should override this, but the default is the identity function.
   * 應該重寫這個，但預設是identity函式
   * 這裡的key和value是輸入的
   */
  @SuppressWarnings("unchecked")
  protected void map(KEYIN key, VALUEIN value, 
                     Context context) throws IOException, InterruptedException {
	//輸出的key-value context是上下文,屬於管理者
    context.write((KEYOUT) key, (VALUEOUT) value);
	
  }

處理整個map階段的核心業務

 /**
   * Called once at the end of the task.
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
  }

任務結束的時候

/**
   * Expert users can override this method for more complete control over the
   * 專家使用者可以重寫此方法以更完整地控制執行的mapper
   * execution of the Mapper.
   * @param context
   * @throws IOException
   */
  public void run(Context context) throws IOException, InterruptedException {
	//初始化資料(初始化集合,載入表等)
    setup(context);
    try {
      while (context.nextKeyValue()) {
		//核心業務邏輯
        map(context.getCurrentKey(), context.getCurrentValue(), context);
      }
    } finally {
	//最終結束:流的關閉,資源的處理
      cleanup(context);
    }
  }
}

具體的執行map方法的順序

Reducer類

 /**
   * The <code>Context</code> passed on to the {@link Reducer} implementations.
   */
  public abstract class Context 
    implements ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  }

負責寫出資料的

 /**
   * Called once at the start of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

開始的時候呼叫,初始化操作

 /**
   * This method is called once for each key(這個方法被所有key使用). Most applications will define
   * their reduce class by overriding this method(所有的應用都會重寫這個方法). The default implementation(預設是identity函式)
   * is an identity function.
   */
  @SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
  }

具體的Reducer業務邏輯

  /**
   * Called once at the end of the task.
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
  }

收尾的一些關閉流的操作

  /**
   * Advanced application writers can use the 高階應用程式編寫者可以使用
   * {@link #run(org.apache.hadoop.mapreduce.Reducer.Context)} method to
   * control how the reduce task works.控制整個reduce task工作
   */
  public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    try {
      while (context.nextKey()) {
        reduce(context.getCurrentKey(), context.getValues(), context);
        // If a back up store is used, reset it
        Iterator<VALUEIN> iter = context.getValues().iterator();
        if(iter instanceof ReduceContext.ValueIterator) {
          ((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();        
        }
      }
    } finally {
      cleanup(context);
    }
  }

將所有方法串在一起

MapReducer中原始碼Mapper和Reducer方法原始碼解析

原始碼中Mapper類中的方法 /** * The <code>Context</code> passed on to the {@link Mapper} implementations. */ public abstract cla

List去重（資料為物件的情況）及String中的equals()方法和hashCode()方法原始碼分析

面試中經常被問到的list如何去重，用來考察你對list資料結構，以及相關方法的掌握，體現你的java基礎學的是否牢固。我們大家都知道，set集合的特點就是沒有重複的元素。如果集合中的資料型別是基本資料型別，可以直接將list集合轉換成set，就會自動去除重複的元素，這個就相對比較簡單。上一篇

laravel中的where和orwhere的原始碼分析

一、背景博主在寫sql的時候，遇到了要用orwhere的情況，關鍵這個orwhere的條件是一個數組，就是要用orwhereIn的方法來寫。。反正在這之前博主是一直不知道，竟然還有orWhereIn的方法，反

從解析String的hashCode和equals方法原始碼到hash衝突

經常被問到hashcode方法和equals方法還有== ，網上都有結論，但我們不能知其然卻不知其所以然。所以我們從string的hashcode和equals入手，探究這3者，先貼原始碼。 public int hashCode() { int h = hash;

jQuery的isNaN和isNumeric方法原始碼分析

在jQuery1.7之前，是沒有isNumeric這個方法的，與此功能相仿的是!jQuery.isNaN(obj)。 jQuery.isNaN這個方法從1.4X版本開始出現，是isXXX這一系列方法中並不起眼的小成員。它的實現是對ECMA標準中的isNaN方法進行簡單包裝

JS中的toString()和valueOf()方法

object div 國標 erro 默認時間 ror 方法中國 1、toString()方法：主要用於Array、Boolean、Date、Error、Function、Number等對象轉化為字符串形式。日期類的toString()方法返回一個可讀的日期和字符串。

Java 異常的Exception e中的egetMessage()和toString()方法的區別

catch area color sys 區別 ssa clas testin tin Exception e中e的getMessage()和toString()方法的區別：示例代碼1： public class TestInfo { private stati

js中的call()和apply()方法

-m spa script apply() obj cli nbsp val glob 1.call() 語法：obj1.call(obj2[,param1,param2,...])定義：用obj2對象來代替obj1，調用obj1的方法。即將obj1應用到obj2上。說明：

Java中對域和靜態方法的訪問不具有多態性

ext 轉型 highlight .get 判斷 fin color icm true 1.將方法調用同方法主體關聯起來被稱為 2.編譯期綁定（靜態）是在程序編譯階段就確定了引用對象的類型 3.運行期綁定（動態綁定）是指在執行期間判斷所引用對象的實際類型，根據其實際的類型調

JS中的call()和apply()方法區別

prototype 理解 ace attach bdb .html closed mil solid 如果沒接觸過動態語言,以編譯型語言的思維方式去理解javaScript將會有種神奇而怪異的感覺,因為意識上往往不可能的事偏偏就發生了,甚至覺得不可理喻.如果在學Java

Python中的分片和索引方法

python分片和索引本章淺寫字符串的分片和索引分片分片就是給定義的字符串中分離出部分內容字符串的分片和索引取第5位字符的值為e註意給出一個字符串，可輸出任意一個字符，也可從後往前取，是從-1開始的負數; string[0]代表第一個字符，string[-1]為最後一個字符，空格也算一個字符； aa =

PHP中的call和callStatic方法（未看完）

def 不可見重載方法 varchar baidu value dso argument dom 如何防止調用不存在的方法而出錯,使用__call魔術重載方法. __call方法原型如下: mixed __call(string $name,array $argume

InputStream中通過mark和reset方法重復利用緩存

輸出 string sys 都是實現源碼常用 ges 不支持通過緩存InputStream可重復利用一個InputStream，但是要緩存一整個InputStream內存壓力可能是比較大的。如果第一次讀取InputStream是用來判斷文件流類型，文件編碼等

理解Java中的hashCode和equals 方法

err array size tex nat 什麽 map 交流群培訓在Java裏面所有的類都直接或者間接的繼承了java.lang.Object類，Object類裏面提供了11個方法，如下： Java代碼 ```` 1，clone() 2，equals(Obje

urllib模塊中parse函數中的urlencode和quote_plus方法

生成變化嘗試 appid 微信 notify 固定 param reat 本來只是向看一下quote_plus的作用，然後發現urlencode方法也是很方便的一個組合字符串的方法首先是介紹一下urlencode，他是將一些傳入的元素使用&串聯起來，效果如下：

JAVA中關於set()和get()方法的理解及使用

當我一般來說怎麽而是知識了解構造 set 今後 https://www.cnblogs.com/fly-sky-han/p/6564439.html 我們先來看看set和get這兩個詞的表面意思，set是設置的意思，而get是獲取的意思，顧名思義，這兩個方法是對

python 中的str 和repr方法

創建對象交互以及程序通過統一內部 cal Language 看下面的例子就明白了 class Test(object): def __init__(self, value=‘hello, world!‘): self.data = val

c#中datareader中HasRows屬性和Read方法的區別

可用記錄開頭 als true spa read 返回 row datareader.HasRows 屬性獲取一個值，該值指示 SqlDataReader 是否包含一行或多行,如果是則返回true,否則返回false; datareader

java中抽象類和抽象方法到底什麽關系？請舉例說明！

init 舉例 web nds ike anti 聲明 use dont 抽象類和抽象方法什麽關系？抽象類中可能有抽象方法，也可能沒有抽象方法。（視頻下載）（全部書籍）那位說，就跟沒說一樣，那抽象類和抽象方法都叫抽象，他們必定有關系，那關系是什麽呢？如果一個類中有抽象方法

JS46 JS中的match和exec方法

關於reg.exec和string.match方法 exec是RegExp物件的方法，引數才是字串，match是字串執行匹配正則表示式規則的方法，引數是正則表達，返回的都是陣列；在正則表示式沒有全域性標誌g時，二者的返回值是相同的 - 正則表示式中沒有捕獲組時，返回值是隻有一

MapReducer中原始碼Mapper和Reducer方法原始碼解析

原始碼中Mapper類中的方法

Reducer類

相關推薦