mapreduce學習筆記三：平均值

阿新 • • 發佈：2018-11-19

求平均數是MapReduce比較常見的演算法，求平均數的演算法也比較簡單，一種思路是Map端讀取資料，在資料輸入到Reduce之前先經過shuffle，將map函式輸出的key值相同的所有的value值形成一個集合value-list，然後將輸入到Reduce端，Reduce端彙總並且統計記錄數，然後作商即可。

package mapreduce;  
import java.io.IOException;  
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
 
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.Mapper;  
import org.apache.hadoop.mapreduce.Reducer;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;  
public class MyAverage{  
    public static class Map extends Mapper<Object , Text , Text , IntWritable>{  
         
private static Text newKey=new Text();  
        public void map(Object key,Text value,Context context) throws IOException, InterruptedException{  
            String line=value.toString();  
            System.out.println(line);  
            String arr[]=line.split("   ");  
            newKey.set(arr[0]);
            System.out.println(arr[0]);
            System.out.println(arr[1]);
            int click=Integer.parseInt(arr[1]);  
            context.write(newKey, new IntWritable(click));  
        }  
    }  
    public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{  
        public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{  
            int num=0;  
            int count=0;  
            for(IntWritable val:values){  
                num+=val.get();  
                count++;  
            }  
            int avg=num/count;  
            context.write(key,new IntWritable(avg));  
        }  
    }  
    @SuppressWarnings("deprecation")
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{  
        Configuration conf=new Configuration(); 
        conf.set("dfs.client.use.datanode.hostname", "true");
        System.out.println("start");  
        Job job =new Job(conf,"MyAverage");  
        job.setJarByClass(MyAverage.class);  
        job.setMapperClass(Map.class);  
        job.setReducerClass(Reduce.class);  
        job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(IntWritable.class);  
        job.setInputFormatClass(TextInputFormat.class);  
        job.setOutputFormatClass(TextOutputFormat.class);  
        Path in=new Path("hdfs://*:9000/user/hadoop/input/c.txt"); 
        System.out.println("in執行完畢");
        Path out=new Path("hdfs://*:9000/user/hadoop/output");
        System.out.println("out執行完畢");
        Path path = new Path("hdfs://*:9000/user/hadoop/output");// 取第1個表示輸出目錄引數（第0個引數是輸入目錄）
        FileSystem fileSystem = path.getFileSystem(conf);// 根據path找到這個檔案
        if (fileSystem.exists(path)) {
            fileSystem.delete(path, true);// true的意思是，就算output有東西，也一帶刪除
        } 
        FileInputFormat.addInputPath(job,in);  
        FileOutputFormat.setOutputPath(job,out);  
        System.exit(job.waitForCompletion(true) ? 0 : 1);  

    }  
}

mapreduce學習筆記三：平均值

Linux學習筆記(三)：系統執行級與執行級的切換

查看用戶操作回車 water hat ntsysv tde 文件表 config 1.Linux系統與其它的操作系統不同，它設有執行級別。該執行級指定操作系統所處的狀態。Linux系統在不論什麽時候都執行於某個執行級上，且在不同的執行級上執行的程序和服務都不同，所要

MYSQL學習筆記三：日期和時間函數

div content minute name top fonts table hmm 指定 MYSQL學習筆記三：日期和時間函數 1. 獲取當前日期的函數和獲取當前時間的函數 /*獲取當前日期的函數和獲取當前時間的函數。將日期以‘YYYY-MM-DD‘或者’YYYYM

Odoo10學習筆記三：模型（結構化的應用數據）、視圖（用戶界面設計）

其他描述用戶界面列表支持字段界面設計允許學習一：模型 1：創建模型模型屬性：模型類可以使用一些屬性來控制它們的一些行為： _name ：創建odoo模型的內部標識符，必含項。 _description ：當用戶界面顯示模型時，一個方便用戶的模型記錄標題。

tensorflow學習筆記(三)：實現自編碼器

sea start ear var logs cos soft 編碼 red 黃文堅的tensorflow實戰一書中的第四章，講述了tensorflow實現多層感知機。Hiton早年提出過自編碼器的非監督學習算法，書中的代碼給出了一個隱藏層的神經網絡，本人擴展到了多層，改進

CSS學習筆記三：自定義單選框，復選框，開關

sla checked 移動 transform 第一個 16px 位移 block back 一點一點學習CCS，這次學習了如何自定義單選框，復選框以及開關。一、單選框 1、先寫好body裏面的樣式，先寫幾個框 1 <body> 2 <d

Android學習筆記三：用Intent串聯activity

conda data activity setresult result 意圖 prot 其他 cte 一：Intent Intent可以理解為意圖。我們可以通過創建intent實例來定義一個跳轉意圖，意圖包括：要跳轉到哪個頁面、需要傳遞什麽

vue學習筆記(三)：vue-cli腳手架搭建

node log ins 版本返回 ges 技術分享安裝webpack webp 一：安裝vue-cli腳手架： 1：為了確保你的node版本在4.*以上，輸入 node -v 查看本機node版本，低於4請更新。 2：輸入： npm install -g vue-c

MYSQL進階學習筆記三：MySQL流程控制語句！（視頻序號：進階_7-10)

sls @age 分享流程 null set oop 默認 soft 知識點四：MySQL流程控制語句(7-10) 選擇語句：　　（IF ELSE ELSE IF CASE 分支）IFNULL函數 IF語法：語法規則：

Docker學習筆記三：Docker鏡像image

nta process space ffffff 筆記地址 running build mark Docker的C/S模式的運行一：查看鏡像#docker images REPOSITORY TAG IMAGE ID

ROS學習筆記(三)：自定義話題的程式設計

前言：ros給我們提供了眾多的訊息結構，但是更多時候我們需要根據自己的研發需求定義自己的訊息結構。一、檢視ros自帶的訊息結構我們最常用的一個訊息結構就是std_msgs，那麼怎麼檢視這個訊息結構支援可以定義哪些資料型別呢？我們使用roscd std_msgs/這個命令開啟該訊息結

Linux學習筆記三：Linux常用命令

1.目錄處理命令ls ls -a顯示所有檔案，包括隱藏檔案 -l詳細資訊顯示 -lh以人性化方式顯示列出來的資料的顯示形式：許可權別調用次數檔案所

分散式學習筆記三：分散式系統session一致性的問題

session的概念什麼是session？伺服器為每個使用者建立一個會話，儲存使用者的相關資訊，以便多次請求能夠定位到同一個上下文。這樣，當用戶在應用程式的 Web 頁之間跳轉時，儲存在 Session 物件中的變數將不會丟失，而是在整個使用者會話中一直存在下去。當用戶請求來自應用程式的

JavaScript 學習筆記三：輸出輸入語句

文章目錄一、輸出語句二、輸入語句三、示例：一、輸出語句 JavaScript 沒有任何列印或者輸出函式。可以通過以下方式來顯示資料： 1.使用 windows.alert() 彈出警告框，可省略windows，直接使用 aler

csdn學習筆記三：meta元表、元方法 index, newindex、rawset、rawget

重要：在表和元表的__index 和 __newindex 都沒有需要操作的key時，賦值table操作會呼叫__newindex，取值操作會呼叫__index 元表設定setmetatable t1 = {}; t2 = {}; print("t1=",

mapreduce學習筆記二：去重實驗

bound pac except 計算 throws 問題多少 tasks tostring 實驗原理 “數據去重”主要是為了掌握和利用並行化思想來對數據進行有意義的篩選。統計大數據集上的數據種類個數、從網站日誌中計算訪問地等這些看似龐雜的任務都

python爬蟲學習筆記三：圖片爬取

圖片爬取的程式碼 r.content 檔案的二進位制格式 Python引入了with語句來自動幫我們呼叫close()方法： open(path, ‘-模式-‘,encoding=’UTF-8’) w：以寫方式開啟， a：以追加模式開啟 (從 EOF 開始, 必要時建

mapreduce學習筆記四：排序

code 表示特性 files writable arr pri 產生怎麽 1.Map端：（1）每個輸入分片會讓一個map任務來處理，默認情況下，以HDFS的一個塊的大小（默認為64M）為一個分片，當然我們也可以設置塊的大小。map輸出的結果會暫且放在一個環形內存緩

Esper學習筆記三：EPL語法（1）

1.EPL語法簡介 EPL全稱Event Processing Language，是一種類似SQL的語言，包含了SELECT, FROM, WHERE, GROUP BY, HAVING 和 ORDER BY子句，同時用事件流代替了table作為資料來源，並且能像SQL那樣join，fil

機會網路：ONE學習筆記三：defualt_settings.txt配置檔案解讀

本文參考了 https://www.cnblogs.com/SunSmileCS/archive/2012/12/28/2836927.html 以及https://blog.csdn.net/wb7931021/article/details/41077047 本文屬於學習並做筆

mapreduce學習筆記三：平均值

相關推薦