1. 程式人生 > >Atitit hadoop使用總結 目錄 1.1. 下載300m ,解壓後800M 1 1.2. 二:需要的jar包 1 2. Demo code 2 2.1. WCMapper 2 2.2. WC

Atitit hadoop使用總結 目錄 1.1. 下載300m ,解壓後800M 1 1.2. 二:需要的jar包 1 2. Demo code 2 2.1. WCMapper 2 2.2. WC

Atitit hadoop使用總結

 

目錄

1.1. 下載300m ,解壓後800M 1

1.2. 二:需要的jar包 1

2. Demo code 2

2.1. WCMapper 2

2.2. WCReduce 3

2.3. (3)實現執行驅動 3

3. Run 設定Hadoop  HADOOP_HOME 6

3.1. Input txt 6

3.2. Run output console 6

3.3. Result output .txt 7

4. 四:操作流程 jar mode 7

5. Ref 7

 

 

    1. 下載300m ,解壓後800M

 

HDFS是Hadoop大資料平臺中的分散式檔案系統,為上層應用或其他大資料元件提供資料儲存,如Hive,Mapreduce,Spark,HBase等。

 

 

    1. 二:需要的jar包

 

 

hadoop-2.4.1\share\hadoop\common\hadoop-common-2.4.1.jar

hadoop-2.4.1\share\hadoop\common\lib\所有jar包

 

 hadoop-2.4.1\share\hadoop\mapreduce\lib\所有jar包

---------------------

 

 

  1. Demo code
    1. WCMapper 

package hadoopDemo;

 

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

 

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

 

import java.io.IOException;

 

//  public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {

public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

 

// 1.mapper階段,切片

// 1).mapper類首先要繼承自mapper類,指定輸入的key型別,輸入的value型別

// 2).指定輸出的key型別,輸出的value型別

// 3).重寫map方法

// 在map方法裡面獲取的是文字的行號,一行文字的內容,寫出的上下文物件

 

 

 

@Override

protected void map(LongWritable key, Text value_line, Context context) throws IOException, InterruptedException {

String line = value_line.toString();

String[] words = line.split(" ");

for (String word : words) {

Text key_Text = new Text();

IntWritable val_IntWritable = new IntWritable(1);

key_Text.set(word);

context.write(key_Text, val_IntWritable);

}

}

}

 

    1. WCReduce 

 

package hadoopDemo;

 

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

 

import com.alibaba.fastjson.JSON;

import com.google.common.collect.Maps;

 

import java.io.IOException;

import java.util.Map;

 

public class WCReduce extends Reducer<Text,IntWritable,Text,IntWritable> {

    @Override

    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

        int sum=0; //定義一個變數來統計單詞出現的次數

        for (IntWritable num:values //遍歷這個迭代器,累計單詞出現的次數

             ) {

            sum += num.get();

            

            Map  m=Maps.newConcurrentMap();

            m.put("key",key );

            m.put("num",num);

            m.put("sum_curr",sum );

            System.out.println(JSON.toJSONString(m));

        }

        context.write(key,new IntWritable(sum));

    }

}

 

    1. (3)實現執行驅動

執行驅動的目的就是在程式中指定使用者的Map類和Reduce類,並配置提交給Hadoop時的相關引數。例如實現一個詞頻統計的wordcount驅動類:MyWordCount.java,其核心程式碼如下:

 

 

 

package hadoopDemo;

 

 

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

 

public class WCDriver {

    public static void main(String[]  args) throws IOException, ClassNotFoundException, InterruptedException {

    

    

     System.load("D:\\haddop\\hadoop-3.1.1\\bin\\hadoop.dll");

    //建立Job作業

        Job job  = Job.getInstance(new Configuration());

    //設定驅動類

        job.setJarByClass(WCDriver.class);

        //設定mapper類、reduce類

        job.setMapperClass(WCMapper.class);

        job.setReducerClass(WCReduce.class);

        //設定map階段輸出的key型別、value型別

        job.setMapOutputKeyClass(Text.class);

        job.setMapOutputValueClass(IntWritable.class);

        //設定reduce階段輸出key型別、value型別

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(IntWritable.class);

        //設定讀取檔案路徑、輸出檔案路徑

        String path_ipt ="D:\\workspace\\hadoopDemo\\ipt.txt";

FileInputFormat.setInputPaths(job, new Path(path_ipt));

        String path_out = "D:\\workspace\\hadoopDemo\\out.txt";

FileOutputFormat.setOutputPath(job, new Path(path_out));

        //等待提交作業

        boolean result = job.waitForCompletion(true);

        System.out.println(result);

        while(true)

        {

         Thread.sleep(5000);

         System.out.println("..");

        }

    //    System.exit(result ? 0 : 1);

    }

}

 

 

 

import org.apache.hadoop.conf.Conf?iguration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MyWordCount {

   public static void main(String[] args) throws Exception {

     Conf?iguration conf = new Conf?iguration();

     Job job = new Job(conf, "word count");

     job.setJarByClass(MyWordCount.class);

     job.setMapperClass(WordcountMapper.class);

     job.setCombinerClass(WordcountReducer.class);

     job.setReducerClass(WordcountReducer.class);

     job.setOutputKeyClass(Text.class);

     job.setOutputValueClass(IntWritable.class);

     FileInputFormat.addInputPath(job, new Path(args[0]));

     FileOutputFormat.setOutputPath(job, new Path(args[1]));

     System.exit(job.waitForCompletion(true) ? 0 : 1);

   }

}

從上述核心程式碼中可以看出,需要在main函式中設定輸入/輸出路徑的引數,同時為了提交作業,需要job物件,並在job物件中指定作業名稱、Map類、Reduce類,以及鍵值的型別等引數。來源:CUUG官網

 

  1. Run 設定Hadoop  HADOOP_HOME

可以通過附加下面的命令到 ~/.bashrc 檔案中設定 Hadoop 環境變數。

export HADOOP_HOME=/usr/local/hadoop

Eclipse envi only can cfg in run cfg ..

 

    1. Input txt 

 

aaa bbb ccc aaa

 

    1. Run output console

{"num":{},"sum_curr":1,"key":{"bytes":"YWFh","length":3}}

{"num":{},"sum_curr":2,"key":{"bytes":"YWFh","length":3}}

{"num":{},"sum_curr":1,"key":{"bytes":"YmJi","length":3}}

{"num":{},"sum_curr":1,"key":{"bytes":"Y2Nj","length":3}}

 

    1. Result output .txt

D:\workspace\hadoopDemo\out.txt\part-r-00000  file

aaa 2

bbb 1

ccc 1

 

  1. 四:操作流程 jar mode

 

1、將專案打成jar包上傳到虛擬機器上 if use jar mode

 

執行jar檔案

 

 

  1. Ref

Mapreduce例項---統計單詞個數(wordcount) - Tyshawn的部落格 - CSDN部落格.html

MapperReduce入門Wordcount案例 - 小劉的部落格 - CSDN部落格.html