結合案例講解MapReduce重要知識點 ------- 使用自定義MapReduce資料型別實現二次排序

阿新 • • 發佈：2018-12-20

自定義資料型別SSData

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;

public class SSData implements WritableComparable<SSData>{

	private int first;
	private int second;
	
	public SSData(){
		
	}
	
	public SSData(int first, int second) {
		this.first = first;
		this.second = second;
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeInt(first);
		out.writeInt(second);
	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.first = in.readInt();
		this.second = in.readInt();
	}

	@Override
	public int compareTo(SSData o) { 
		int tmp = this.first - o.first;  //第一列jiang序
		if(tmp != 0){
			return tmp;
		}
		//
		//return o.second.compareTo(this.second);
		return o.second - this.second;   //第2列jiang序
	}
	
	
	@Override
	public int hashCode() {
		final int prime = 31;
		int result = 1;
		result = prime * result + first;
		result = prime * result + second;
		return result;
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		SSData other = (SSData) obj;
		if (first != other.first)
			return false;
		if (second != other.second)
			return false;
		return true;
	}

	/**
	 * @return the first
	 */
	public int getFirst() {
		return first;
	}

	/**
	 * @param first the first to set
	 */
	public void setFirst(int first) {
		this.first = first;
	}

	/**
	 * @return the second
	 */
	public int getSecond() {
		return second;
	}

	/**
	 * @param second the second to set
	 */
	public void setSecond(int second) {
		this.second = second;
	}

	@Override
	public String toString() {
		return "[ "+first + "    " + second+" ]";
	}
}

MapReduce類SecondarySort

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class SecondarySort extends ToolRunner implements Tool{

	/**
	 * 自定義的myMapper
	 * @author lyd
	 *
	 */
	static class MyMapper extends Mapper<LongWritable, Text, SSData, IntWritable>{

		@Override
		protected void setup(Context context)throws IOException, InterruptedException {
		}

		@Override
		protected void map(LongWritable key, Text value,Context context)
				throws IOException, InterruptedException {
			String line = value.toString();
			String lines [] = line.split(" ");
			SSData ss = new SSData(Integer.parseInt(lines[0]), Integer.parseInt(lines[1]));
			context.write(ss, new IntWritable(Integer.parseInt(lines[1])));
		}

		@Override
		protected void cleanup(Context context)throws IOException, InterruptedException {
		}
		
	}
	
	/**
	 * 自定義MyReducer
	 * @author lyd
	 *
	 */
	static class MyReducer extends Reducer<SSData, IntWritable, SSData, Text>{

		@Override
		protected void setup(Context context)throws IOException, InterruptedException {
		}
		
		@Override
		protected void reduce(SSData key, Iterable<IntWritable> value,Context context)
				throws IOException, InterruptedException {
			/**
			 * 
			 */
			/*for (IntWritable i : value) {
				SSData ss = new SSData(key.get(), i.get());
				context.write(ss, new Text(""));
			}*/
			
			context.write(key, new Text(""));
		}
		
		@Override
		protected void cleanup(Context context)throws IOException, InterruptedException {
		}
	}
	
	
	@Override
	public void setConf(Configuration conf) {
		conf.set("fs.defaultFS", "hdfs://hadoop01:9000");
	}

	@Override
	public Configuration getConf() {
		return new Configuration();
	}
	
	/**
	 * 驅動方法
	 */
	@Override
	public int run(String[] args) throws Exception {
		//1、獲取conf物件
		Configuration conf = getConf();
		//2、建立job
		Job job = Job.getInstance(conf, "model01");
		//3、設定執行job的class
		job.setJarByClass(SecondarySort.class);
		//4、設定map相關屬性
		job.setMapperClass(MyMapper.class);
		job.setMapOutputKeyClass(SSData.class);
		job.setMapOutputValueClass(IntWritable.class);
		FileInputFormat.addInputPath(job, new Path(args[0]));
		
		//5、設定reduce相關屬性
		job.setReducerClass(MyReducer.class);
		job.setOutputKeyClass(SSData.class);
		job.setOutputValueClass(Text.class);
		//判斷輸出目錄是否存在，若存在則刪除
		FileSystem fs = FileSystem.get(conf);
		if(fs.exists(new Path(args[1]))){
			fs.delete(new Path(args[1]), true);
		}
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		//6、提交執行job
		int isok = job.waitForCompletion(true) ? 0 : 1;
		return isok;
	}
	
	/**
	 * job的主入口
	 * @param args
	 */
	public static void main(String[] args) {
		try {
			//對輸入引數作解析
			String [] argss = new GenericOptionsParser(new Configuration(), args).getRemainingArgs();
			System.exit(ToolRunner.run(new SecondarySort(), argss));
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

結合案例講解MapReduce重要知識點 ----------- 自定義MapReduce資料型別（1）重寫Writable介面

重寫Writable介面如下程式碼就是自定義mr資料型別，在wordcount類使用它。 WordCountWritable import java.io.DataInput; import java.io.DataOutput; import java.io.IOE

結合案例講解MapReduce重要知識點 ------- 使用自定義MapReduce資料型別實現二次排序

自定義資料型別SSData import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.WritableCompa

用typedef自定義的資料型別

嚴格說，它不是一種新型別，使用typedef一般用來達到以下幾個目的： 1，用來定義一種型別的別名，比如說一個型別名稱特別長，為了書寫方便和便於程式碼的閱讀，實現別名功能（複雜名字簡單化）， typedef char* PCHAR; PCHAR pa, pb; struct Hello_

Android錯誤引用自定義資源資料型別，造成安裝解析產生未知錯誤

Android 2.3.3 Eclipse Version: 3.7.0 LogCat Console 報錯資訊： [2012-02-15 10:24:31 - taobao] ------------------------

使用Object物件的toString()方法自定義判斷資料型別方法

Object.prototype.toString方法返回物件的型別字串 Object.prototype.toString.call(2) // "[object Number]" Obj

QT學習筆記（七）QDataStream傳遞自定義的資料型別

QT自帶的QDataStream只能傳遞它自己要求的資料型別。但是我們可以自己過載QDataStream& operator<<()和QDataStream& operator>>();mydatastream.h#ifndef MYD

Redis 儲存自定義的資料型別

Redis自帶的基本型別的操作可以自行查閱資料，網上可以輕易找到很多的相關的資料。儲存自定義型別時需要進行序列化、反序列化。 1. Java示例程式碼 //定義需要儲存的資料 Student

結合案例講解MapReduce重要知識點 -------- 使用自定義資料實現記憶體排序

自定義資料WCData import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import org.apache.hadoop.io.WritableComparab

結合案例講解MapReduce重要知識點 --------- 簡單排序

import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.

結合案例講解MapReduce重要知識點 -------- 記憶體排序

TOP N 資料： hello qianfeng hello qianfeng qianfeng is best qianfeng better hadoop is good spark is nice 取統計後的前三名： qianfeng 4 is

結合案例講解MapReduce重要知識點 --------- 多表連線

第一張表的內容： login： uid sexid logindate 1 1 2017-04-17 08:16:20 2 2 2017-04-15 06:18:20 3 1 2017-04-16 05:16:24 4 2 2017-04-14 03:18:20

MapReduce實戰：自定義輸入格式實現成績管理

stat app 註意 false exce 考試成績 fileinput collect 劃分 1. 項目需求　　我們取有一份學生五門課程的期末考試成績數據，現在我們希望統計每個學生的總成績和平均成績。樣本數據如下所示，每行數據的數據格式為：學號、

MapReduce系列之自定義Partitioner

partitioner定義：分割槽器 partitioner的作用是將mapper（如果使用了combiner的話就是combiner）輸出的key/value拆分為分片（shard），每個reducer對應一個分片。預設情況下，partitioner先計算key的雜湊值（通常為md5值）。然後

大資料-Hadoop生態(15)-MapReduce框架原理-自定義FileInputFormat

1. 需求將多個小檔案合併成一個SequenceFile檔案（SequenceFile檔案是Hadoop用來儲存二進位制形式的key-value對的檔案格式），SequenceFile裡面儲存著多個檔案，儲存的形式為檔案路徑+名稱為key，檔案內容為value 三個小檔案 on

自定義MapReduce業務邏輯

1.我們剛一開始的時候，在HDFS上面處理檔案時候，我們並沒有自己寫MapReduce，而是用的是映象架包下面的/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar，同樣的也將執行出來結果（hadoop jar hadoop-ma

MapReduce-XML處理-自定義InputFormat及自定義RecordReader

這一篇說明如何自定義InputFormat以及RecordReader這兩個元件，通過使用mapreduce處理xml檔案格式的檔案來說明其用法，這一個例子來自《hadoop硬實戰》一書的技術點12講解的用法，如果有說明得不清楚的可以自行進行查閱下面就來說說這個例項要達到的目

MapReduce資料型別及自定義MapReduce資料型別

MapReduce資料型別資料型別都要實現Writable介面，以便用這些型別定義的資料可以被序列化進行網路傳輸和檔案儲存。自定義key資料型別的時候，因為需要對key進行排序，需要繼承java中的比較器，所以可以直接繼承WritableComparable

Mapreduce中的自定義型別、分組與二次排序

0、需求說明資料格式期望輸出的結果做簡單分析： a. 由於只有兩列，所以可以將map的InputFormat設定為KeyValueTextInputFormat b. 事實上這裡實現了兩個排序，即對輸出的k

Hadoop之MapReduce自定義二次排序流程例項詳解

一、概述 MapReduce框架對處理結果的輸出會根據key值進行預設的排序，這個預設排序可以滿足一部分需求，但是也是十分有限的。在我們實際的需求當中，往往有要對reduce輸出結果進行二次排序的需求。對於二次排序的實現，網路上已經有很多人分享過了，但是對二次排序的實現的

mapreduce自定義分組、自定義分割槽、二次排序

mapreduce中二次排序的思想中，我們常常需要對資料的分割槽分組進行自定義，以下就介紹一下自定義分割槽分組的簡單實現 1、自定義分割槽： public class demoPartitioner<K, V> extends Partitioner<

結合案例講解MapReduce重要知識點 ------- 使用自定義MapReduce資料型別實現二次排序

相關推薦