大資料入門（11）mr自定義分組和切片劃分

阿新 • • 發佈：2018-11-10

public class AreaPartitioner<KEY, VALUE> extends Partitioner<KEY, VALUE>{

	private static HashMap<String,Integer> areaMap = new HashMap<String,Integer>();
	
	static{
		areaMap.put("135", 0);
		areaMap.put("136", 1);
		areaMap.put("137", 2);
		areaMap.put("138", 3);
		areaMap.put("139", 4);
	}
	
	@Override
	public int getPartition(KEY key, VALUE value, int arg2) {
		// TODO Auto-generated method stub
		//從key中拿到手機號，查詢手機歸屬地字典，不同的省份返回不同的組號
		String num = key.toString().substring(0, 3);
		int code = areaMap.get(num)==null?5:areaMap.get(num);
		return code;
	}

}




/*********************************FlowSumArea.java*********************************************************/





package com.hadoop.hdfs.mr.areapartition;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import com.hadoop.hdfs.mr.flowsum.FlowBean;

public class FlowSumArea {

	public static class FlowSumAreaMapper extends Mapper<LongWritable, Text, Text, FlowBean>{

		@Override
		protected void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			//第一行資料
			String line = value.toString();
			
			//切分
			String[] fields = line.split("\t");
			
			//拿到需要的資料
			String phoneNB = fields[1];
			long u_flow = Long.parseLong(fields[7]);
			long d_flow = Long.parseLong(fields[8]);
			
			//封裝資料為kv形式
			context.write(new Text(phoneNB), new FlowBean(phoneNB, u_flow, d_flow));
		}
		
	}
	
	
	public static class FlowSumAreaReduce extends Reducer<Text, FlowBean, Text, FlowBean>{

		protected void reduce(Text text, Iterable<FlowBean> values,Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			long u_count = 0;
			long d_count = 0 ;
			
			for (FlowBean bean : values){
				u_count+=bean.getUp_flow();
				d_count+=bean.getD_flow();
			}

			context.write(new Text(text), new FlowBean(text.toString(), u_count, d_count));
		}
		
	}
	
	public static void main(String[] args) throws Exception {
		
		Job job = Job.getInstance(new Configuration());
		
		job.setJarByClass(FlowSumArea.class);
		
		job.setMapperClass(FlowSumAreaMapper.class);
		job.setReducerClass(FlowSumAreaReduce.class);
		
		//設定自定義的分組邏輯定義
		job.setPartitionerClass(AreaPartitioner.class);
		
		
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(FlowBean.class);
		
		//設定reduce 的任務併發數，跟分組的數量保持一致
		job.setNumReduceTasks(6);
		
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		int status = job.waitForCompletion(true)?0:1;
		System.exit(status);
	}
}

大資料入門（11）mr自定義分組和切片劃分

public class AreaPartitioner<KEY, VALUE> extends Partitioner<KEY, VALUE>{ private static HashMap<String,Integer> areaMa

大資料入門（12）mr倒排索引.

package com.hadoop.hdfs.mr.flowsort; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; im

大資料入門（10）序列化機制，mr流量求和

public class FlowBean implements WritableComparable<FlowBean>{ private String phoneNB; private long u

大資料入門（4）hdfs的shell語法

1、測試hdfs檔案上傳和下載（HDFS shell） 1.0檢視幫助 hadoop fs -help <cmd> 1.1上傳 &n

大資料入門（3）配置hadoop

1、上傳hadoop-2.4.1.tar.gz 2、解壓檔案到指定目錄(目錄：admin/app) mkdir app tar -zxvf hadoop-2.4.1.tar.gz -C /app 刪

大資料入門（2）安裝linux的jdk

1、上傳檔案到linux alt+p 進入ftp傳檔案 sftp> put E:\soft\jdk-7u71-linux-x64.tar.gz 2、建立資料夾解壓檔案（root使用者許可權） mkdir /usr/java tar -zxvf jdk-7u71-

大資料入門（1）準備linux環境

1、安裝vmware 2、新建虛擬機器 file - new virtual machine install disc image file(iso) 選擇映象檔案選擇虛擬機器安裝路徑，方便以後copy 3、設定虛擬機器ip

大資料入門（8）hdfs的客戶端檔案操作

package com.hadoop.hdfs; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.net.URI; im

大資料入門（17）hbase叢集搭建

1.上傳hbase安裝包 2.解壓 3.配置hbase叢集，要修改3個檔案（首先zk叢集已經安裝好了） &nbs

大資料入門（16）mysql5.6.26的rpm方式安裝

rpm方式安裝（需要使用root許可權） root 使用者（或者admin 賬戶使用root 許可權：sudo ;設定：vim /etc/sudoers），考慮到一系列的操作，直接用root 1、上傳.tar檔案到某一單獨資料夾解壓：tar -xvf MySQL-5.6.

大資料入門（15）hive簡介和配置

1、上傳檔案，解壓到app 下 tar -zxvf 檔案 -C app 2、不配置檔案的情況下啟動：./hive (目錄：/home/admin/app/hive

大資料入門（14）hadoop+yarn+zookeeper叢集搭建

1、右鍵clone虛擬機器，進入圖形介面，修改虛擬機器ip即可，相關環境變數配置都存在 2、叢集規劃：（必須設定主機名，配置主機名和ip的對映關係，每個檔案都需要配置對映關係）主機名 &

大資料入門（13）zookeeper的安裝配置

1、上傳zookeeper-3.4.6.tar.gz 2、解壓檔案到指定目錄(目錄：admin/app) tar -zxvf zookeeper-3.4.6.tar.gz -C /app 3、配置（一臺節點上） 3.1新

大資料入門（9）mapreduce計算wordcount的程式編寫

1、外部寫好的程式打Java jar 包，匯入jar sftp> put e:/wc.jar 2、建立文字進行計算 vi words.log hadoop fs -mkdir /wc hadoop fs -mkdir /wc/srcData/ 3、執行jar hadoop ja

大資料入門（7）RPC客戶端和RPC服務端通訊

RPC客戶端和RPC服務端通訊：客戶端：（匯入jar:hdfs,common相關的） LoginControl: public class LoginControl { public static void main(String[] args) th

大資料入門（6）hdfs的客戶端java

從hdfs中copy 檔案到當前虛擬機器 1、匯入jar E:\lib\hadoop-2.4.1\share\hadoop\hdfs E:\lib\hadoop-2.4.1\share\hadoop\co

大資料入門（5）配置ssh免密登陸

登陸的115 1、使用ssh登陸 ssh 192.168.1.116 輸入密碼登陸成功退出:exit 2、

大資料入門（20）kafka安裝配置

kafka基本概念 1、kafka是一個分散式的訊息快取系統 2、kafka叢集中的伺服器都叫做broker 3、kafka有兩類客戶端，一類叫producer（訊息生產者），一類叫做consumer（訊息消費者），客戶端和broker伺服器之間採用tcp協議連線 4、kafka中不同業務系統的

大資料入門（19）storm安裝配置

1、安裝一個zookeeper叢集使用weekend05，weekend06，weekend07安裝有zookeeper叢集 2、上傳storm的安裝包，解壓需要3臺機子，nim

大資料入門（0）linux的基本命令

最近研究大資料，將linux的基本命令整理如下： 1、設定圖形介面 vim /etc/inittab 2、清螢幕 clear 3、建立使用者，設定密碼 useradd test passwd test 4、切換使用者 su - test 5、檢視當前目錄， pwd, /h

大資料入門（11）mr自定義分組和切片劃分

相關推薦