hive的mr和map-reduce基本設計模式

阿新 • • 發佈：2017-08-24

key format values 模式 none columns lan pac ...

（原創文章，謝絕轉載~）

hive可以使用 explain 或 explain extended (select query) 來看mapreduce執行的簡要過程描述。explain出來的結果類似以下：

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:    --map tree
          TableScan
            alias: testtb
            Statistics: Num rows: 0 Data size: 86 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: zd1 (type: string), zd2 (type: string), zd3 (type: string)
              outputColumnNames: zd1, zd2, zd3
              Statistics: Num rows: 0 Data size: 86 Basic stats: PARTIAL Column stats: NONE
              Group By Operator
                aggregations: sum(zd3)
                keys: zd1 (type: string), zd2 (type: string)
                mode: hash
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 0 Data size: 86 Basic stats: PARTIAL Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string), _col1 (type: string)
                  sort order: ++
                  Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
                  Statistics: Num rows: 0 Data size: 86 Basic stats: PARTIAL Column stats: NONE
                  value expressions: _col2 (type: double)
      Reduce Operator Tree:    --reduce tree
        Group By Operator
          aggregations: sum(VALUE._col0)
          keys: KEY._col0 (type: string), KEY._col1 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2
          Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), _col1 (type: string), _col2 (type: double)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1

可以通過此分析mapreduce過程。以上為對zd1，zd2 分組，求sum（zd3）的mr過程：

這個直接根據需要group by的字段作為 key，hive 默認在map端先做一次聚合（set hive.map.aggr=true），且mode為 hash；然後再到reduce端聚合，此時reduce端的mode為mergepartial，如果設置不在map端聚合set hive.map.aggr=false，那麽reduce端的mode是 complete 。

mapreduce的基本設計模式：（參考資料：MapReduce Design Pattern -by Donald Miner and Adam Shook )

1.分組數值聚合，這個模式下map端直接根據需要分組（group by）的字段作為keys，values包括需要的數據，reduce端， f(values) 得到需要的結果（以keys為組）

2.join，map端關聯字段作為keys，每條record作為輸出，不同表的數據打上flag，reduce端根據每組keys的數據，每個flag的數據放在這個flag的list下，然後不同的list的數據再join輸出即可，若inner join那麽限制list都不空，left、right join等則list為空也輸出。

（待續....）

hive的mr和map-reduce基本設計模式

key format values 模式 none columns lan pac ... （原創文章，謝絕轉載~） hive可以使用 explain 或 explain extended (select query) 來看mapreduce執行的簡要過程描述。expla

hive的mr和map-reduce基本設計模式

hive的mr和map-reduce基本設計模式

23種基本設計模式-概述

js_面向對象設計和行為委托設計模式

第四課：Yarn和Map/Reduce配置啟動和原理講解

關於iOS六大基本設計模式

用程式碼和UML圖化解設計模式之《代理模式》

用程式碼和UML圖化解設計模式之《責任鏈模式》

讀《大話設計模式》和《head first 設計模式》心得

Qt 之 Concurrent Map 和 Map-Reduce

C#中的委託和事件(提及Observer設計模式)（轉載）

基本設計模式學習筆記：（一）常見的七種面向物件設計原則

PHP設計模式：類自動載入、PSR-0規範、鏈式操作、11種面向物件設計模式實現和使用、OOP的基本原則和自動載入配置

Java——多執行緒基本使用（四）執行緒組和執行緒池的使用，工廠設計模式的使用

Java——多執行緒基本使用（三）餓漢式和懶漢式的單例設計模式，多執行緒之間的通訊

【設計模式】簡單工廠模式和工廠方法模式

[Java][Web]Request 實現轉發和 MVC 設計模式

[轉]設計模式--單例模式（一）懶漢式和餓漢式

《javascript設計模式》讀書筆記二（封裝和隱藏信息）

JavaScript 設計模式入門和框架中的實踐 http://www.codeceo.com/article/javascript-design-pattern.html

Java學習筆記——設計模式之六.原型模式（淺克隆和深克隆）

hive的mr和map-reduce基本設計模式

相關推薦