HBASE(五 MapReduce)

阿新 • • 發佈：2019-02-08

Hbase 也可以做一些MapReduce操作

Hbase的MaprReuce 無非三種：

HDFS 中的資料成為 Hbase 的某個表的某一列
HBase中的某一列成為HDFS 中的資料
HBase某一表某列加工流入 HBase另一表中某列

實現Demo如下：

1.建立兩個表插入模板資料

public class HbaseMR {

    private static Configuration conf;
    private static Connection conn;
    private static Admin admin; 


    static{
        conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","hadoop01:2181,hadoop02:2181,hadoop03:2181");
        try {
          conn = ConnectionFactory.createConnection(conf);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }


    public static void initTable(){
        try {
            //建立兩個表
            admin = conn.getAdmin 
();
            HTableDescriptor word = new HTableDescriptor(TableName.valueOf("word"));
            HTableDescriptor stat = new HTableDescriptor(TableName.valueOf("stat"));
            HColumnDescriptor content = new HColumnDescriptor("content");
            word.addFamily(content);
            stat.addFamily 
(content);
            admin.createTable(word);
            admin.createTable(stat);
            //初始化第一個表的資料
            Table table = conn.getTable(TableName.valueOf("word"));
            table.setAutoFlushTo(false);
            table.setWriteBufferSize(5);
            List<Put> lp = new ArrayList<Put>();
            Put p1 = new Put(Bytes.toBytes("1"));
            p1.add("content".getBytes(), "info".getBytes(), ("The Apache Hadoop software library is a framework").getBytes());
            lp.add(p1);
            Put p2 = new Put(Bytes.toBytes("2"));
            p2.add("content".getBytes(),"info".getBytes(),("The common utilities that support the other Hadoop modules").getBytes());
            lp.add(p2);
            Put p3 = new Put(Bytes.toBytes("3"));
            p3.add("content".getBytes(), "info".getBytes(),("Hadoop by reading the documentation").getBytes());
            lp.add(p3);
            Put p4 = new Put(Bytes.toBytes("4"));
            p4.add("content".getBytes(), "info".getBytes(),("Hadoop from the release page").getBytes());
            lp.add(p4);
            Put p5 = new Put(Bytes.toBytes("5"));
            p5.add("content".getBytes(), "info".getBytes(),("Hadoop on the mailing list").getBytes());
            lp.add(p5);
            table.put(lp);
            table.flushCommits();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    }

**2.Mapper 要繼承TableMapper

public class HbaseMapper extends TableMapper<Text,IntWritable> {

    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
    //通過 value獲取某一列族中的某一列 進行加工 
        byte[] l = value.getValue(Bytes.toBytes("content"), Bytes.toBytes("info"));
        String line = new String(l);
        String[] split = line.split(" ");
        for (String s : split) {
            context.write(new Text(s),new IntWritable(1));
        }
    }
}

**3.Reduce 類要繼承TableReduce

public class HbaseReduce extends TableReducer<Text,IntWritable,ImmutableBytesWritable>{

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable value : values) {
            int i  = Integer.parseInt(value.toString());
            sum =sum+i;
        }
        Put put = new Put(Bytes.toBytes(key.toString()));
        put.add(Bytes.toBytes("content"),Bytes.toBytes("info"),Bytes.toBytes(String.valueOf(sum)));

        context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())),put);

    }
}

4.重點:Driver類


    public static void main(String[] args) throws Exception {

        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","hadoop01:2181,hadoop02:2181,hadoop03:2181");

        Job job = Job.getInstance(conf);

        job.setJarByClass(HbaseDriver.class);

        //初始化mapper任務 相當於設定mapper類 
        //引數 分別是: Hbase來源表名，new Scan（）,Mapper類，輸出key，輸出value，job
        TableMapReduceUtil.initTableMapperJob("word",new Scan(),HbaseMapper.class,Text.class,IntWritable.class,job);

        //初始化reduce任務 相當於設定reduce類 
        //引數 分別是: Hbase目的表名，Reduce類，job 
        jobTableMapReduceUtil.initTableReducerJob("stat",HbaseReduce.class,job);


       job.waitForCompletion(true);

    }

HBASE(五 MapReduce)

Hbase 也可以做一些MapReduce操作 Hbase的MaprReuce 無非三種： HDFS 中的資料成為 Hbase 的某個表的某一列 HBase中的某一列成為HDFS 中的資料 HBase某一表某列加工流入 HBase另

HBase權威指南學習記錄（五、hbase與MapReduce整合）

新增依賴： <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifact

大資料（三十五）HBASE【mapreduce操作hbase】

現在有一些大的檔案，需要存入HBase中，其思想是先把檔案傳到HDFS上，利用map階段讀取<key,value>對，可在reduce把這些鍵值對上傳到HBase中。 package test; import java.io.IOException; imp

Hbase(五) hbase內部原理

當前 times filter 提高恢復數據是否最後一行地址一、系統架構客戶端連接hbase依賴於zookeeper，hbase存儲依賴於hadoop client： 1、包含訪問 hbase 的接口， client 維護著一些 cache（

基於HBase的MapReduce實現大量郵件信息統計分析

inittab 寫入 img implement system return dea 比較 tco 一：概述在大多數情況下，如果使用MapReduce進行batch處理，文件一般是存儲在HDFS上的，但這裏有個很重要的場景不能忽視，那就是對於大量的小文件的處理（此處小文件

Hbase整合MapReduce兩個注意的地方

一、通過javaAPI插入資料到HBase時，HBase裡scan出來的資料亂碼，如下程式碼： Put put = new Put(Bytes.toBytes(key.get())); 我這裡的key是Map階段的輸入key，格式為LongWritable，插入資料後，Hbase

HBase的MapReduce呼叫

楔子學習瞭解HBase，使用系統環境是CentOS6.9，Hadoop等版本是CDH5.3.6 配置了Hadoop、HBase等環境變數，yarn可以直接使用。以下基於這些配置 1.1 檢視H

HBase與MapReduce整合操作

1、目的：將HBase中stu_info表中的name放到表user_info中 2、TestHbaseMapper： package com.zzw.hbase.mapreduce; import java.io.IOException; import org.apache.had

Hive,Hbase,HDFS,MapReduce等之間的關係

Hive： Hive不支援更改資料的操作，Hive基於資料倉庫，提供靜態資料的動態查詢。其使用類SQL語言，底層經過編譯轉為MapReduce程式，在Hadoop上執行，資料儲存在HDFS上。 HDFS: HDFS是GFS的一種實現，他的完整名字是分散式檔案系統，類

如何執行hbase 的mapreduce job

執行hbase mapreduce的兩種方法： 1 使用hadoop命令執行mapreduce job. 採用此方式需要修改hadoop-env.sh,將hbase相關的jar包加入到HADOOP_CLASSPATH中去，寫法如下： export HADOOP_C

HBase與MapReduce整合2-Hdfs2HBase

2）File中解析資料到HBase表中（import） Hdfs2HBase 檔案格式的資料->HBase表中Mapreduce* input: hdfs files Mapper:OutputKey/OutputValue* output: hbase t

Hbase與Mapreduce整合的案例

【需求】將info列簇中的name這一列匯入到另外一張表中去建表： create 'test:stu_info','info','degree','work' 插入資料：6個rowkey 3個列簇 put 'test:stu_info','20170222_10001',

hbase與mapreduce同時執行的問題

在hbase資料寫入和mapreduce同時執行時出現hbase regionserver掛掉的問題，同時hdfs上的檔案塊出現miss。目前看來mapreduce和hbase同時執行時出現的一個問題就是記憶體競爭，hbase的regionserver

HBase和MapReduce

HBase集成了MapReduce框架，對錶中大量的資料進行並行處理 HBase為MapReduce每個階段提供了相應的類用來處理表資料 InputFormat類： HBase實現了TableI

HBase建表高階屬性，hbase應用案例看行鍵設計，HBase和mapreduce結合，從Hbase中讀取資料、分析，寫入hdfs，從hdfs中讀取資料寫入Hbase，協處理器和二級索引

1. Hbase高階應用 1.1建表高階屬性下面幾個shell 命令在hbase操作中可以起到很到的作用，且主要體現在建表的過程中，看下面幾個create 屬性 1、 BLOOMFILTER 預設是NONE 是否使用布隆過慮及使用何種方式布隆

Hbase結合Mapreduce示例

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HCol

HBase之——MapReduce構建HBase二級索引

import java.io.IOException; import java.util.HashMap; import java.util.Map; import java.util.Set; import org.apache.hadoop.conf.Configur

Hbase基於Mapreduce的程式設計

小試牛刀，將mapreduce的輸出結果儲存到大型分散式資料庫中HBase中，一個例子，求各url的訪問pv資料,由於用到rcfile格式需要匯入hive-exce包，還需要載入hbase包，如果這兩個包都已經被叢集管理員放到各節點的hadoop/lib下那就可以省去這一步，

hadoop入門筆記MapReduce Shuffle簡介（五）

單位海量數據並行處理詳細但是信息不能 utf 適合 1. MapReduce 定義　　Hadoop 中的 MapReduce是一個使用簡單的軟件框架，基於它寫出來的應用程序能夠運行在由上千個商用機器組成的大型集群上，並以一種可靠容錯式並行處理TB級別的數據集

HBase MapReduce

key tput fig .config java work com blog tokenize 1. HBase to HBase Mapper 繼承 TableMapper，輸入為Rowkey和Result. public abstract class TableMap

HBASE(五 MapReduce)

相關推薦