Linux鞏固記錄（5） hadoop 2.7.4下自己編譯代碼並運行MapReduce程序

阿新 • • 發佈：2017-09-02

parser mod pill self add let tokenize org cto

程序代碼為 ~\hadoop-2.7.4\share\hadoop\mapreduce\sources\hadoop-mapreduce-examples-2.7.4-sources\org\apache\hadoop\examples\WordCount.java

第一次刪除了package

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import 
 org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
    
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
      
    public 
 void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length < 2) {
      System.err.println("Usage: wordcount <in> [<in>...] <out>");
      System.exit(2);
    }
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < otherArgs.length - 1; ++i) {
      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    }
    FileOutputFormat.setOutputPath(job,
      new Path(otherArgs[otherArgs.length - 1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

View Code

此程序需要下面三個jar包才能編譯通過

[[email protected] classes]# tree /home/jars/
/home/jars/
├── commons-cli-1.4.jar
├── hadoop-common-2.7.4.jar
└── hadoop-mapreduce-client-core-2.7.4.jar

執行過程及結果如下

[[email protected] classes]# 
[[email protected] classes]# pwd
/home/classes
[[email protected] classes]# tree
.

0 directories, 0 files
[[email protected] classes]# tree /home/javaFile/
/home/javaFile/
└── WordCount.java

0 directories, 1 file
[[email protected] classes]# tree /home/jars/
/home/jars/
├── commons-cli-1.4.jar
├── hadoop-common-2.7.4.jar
└── hadoop-mapreduce-client-core-2.7.4.jar

0 directories, 3 files
[[email protected] classes]# javac -classpath .:/home/jars/* -d /home/classes/ /home/javaFile/WordCount.java 
[[email protected] classes]# tree 
.
├── WordCount.class
├── WordCount$IntSumReducer.class
└── WordCount$TokenizerMapper.class

0 directories, 3 files
[[email protected] classes]# jar -cvf wordc.jar ./*.class
added manifest
adding: WordCount.class(in = 1907) (out= 1040)(deflated 45%)
adding: WordCount$IntSumReducer.class(in = 1739) (out= 742)(deflated 57%)
adding: WordCount$TokenizerMapper.class(in = 1736) (out= 753)(deflated 56%)
[[email protected] classes]# tree
.
├── wordc.jar
├── WordCount.class
├── WordCount$IntSumReducer.class
└── WordCount$TokenizerMapper.class

0 directories, 4 files
[[email protected] classes]# /home/hadoop-2.7.4/bin/hadoop jar /home/classes/wordc.jar WordCount /hdfs-input.txt /result-self-compile
17/09/02 02:11:45 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.80:8032
17/09/02 02:11:47 INFO input.FileInputFormat: Total input paths to process : 1
17/09/02 02:11:47 INFO mapreduce.JobSubmitter: number of splits:1
17/09/02 02:11:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1504320356950_0010
17/09/02 02:11:47 INFO impl.YarnClientImpl: Submitted application application_1504320356950_0010
17/09/02 02:11:47 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1504320356950_0010/
17/09/02 02:11:47 INFO mapreduce.Job: Running job: job_1504320356950_0010
17/09/02 02:11:56 INFO mapreduce.Job: Job job_1504320356950_0010 running in uber mode : false
17/09/02 02:11:56 INFO mapreduce.Job:  map 0% reduce 0%
17/09/02 02:12:02 INFO mapreduce.Job:  map 100% reduce 0%
17/09/02 02:12:09 INFO mapreduce.Job:  map 100% reduce 100%
17/09/02 02:12:09 INFO mapreduce.Job: Job job_1504320356950_0010 completed successfully
17/09/02 02:12:10 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=118
        FILE: Number of bytes written=241697
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=174
        HDFS: Number of bytes written=76
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3745
        Total time spent by all reduces in occupied slots (ms)=4081
        Total time spent by all map tasks (ms)=3745
        Total time spent by all reduce tasks (ms)=4081
        Total vcore-milliseconds taken by all map tasks=3745
        Total vcore-milliseconds taken by all reduce tasks=4081
        Total megabyte-milliseconds taken by all map tasks=3834880
        Total megabyte-milliseconds taken by all reduce tasks=4178944
    Map-Reduce Framework
        Map input records=6
        Map output records=12
        Map output bytes=118
        Map output materialized bytes=118
        Input split bytes=98
        Combine input records=12
        Combine output records=9
        Reduce input groups=9
        Reduce shuffle bytes=118
        Reduce input records=9
        Reduce output records=9
        Spilled Records=18
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=155
        CPU time spent (ms)=1430
        Physical memory (bytes) snapshot=299466752
        Virtual memory (bytes) snapshot=4159479808
        Total committed heap usage (bytes)=141385728
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=76
    File Output Format Counters 
        Bytes Written=76
[[email protected] classes]# /home/hadoop-2.7.4/bin/hadoop fs -ls /
Found 3 items
-rw-r--r--   2 root supergroup         76 2017-09-02 00:57 /hdfs-input.txt
drwxr-xr-x   - root supergroup          0 2017-09-02 02:12 /result-self-compile
drwx------   - root supergroup          0 2017-09-02 02:11 /tmp
[[email protected] classes]# 
[[email protected] classes]#

第二次沒有刪除package

package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
    
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
      
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length < 2) {
      System.err.println("Usage: wordcount <in> [<in>...] <out>");
      System.exit(2);
    }
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < otherArgs.length - 1; ++i) {
      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    }
    FileOutputFormat.setOutputPath(job,
      new Path(otherArgs[otherArgs.length - 1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

View Code

[[email protected] classes]# 
[[email protected] classes]# tree
.

0 directories, 0 files
[[email protected] classes]# javac -classpath .:/home/jars/* -d /home/classes/ /home/javaFile/WordCount.java 
[[email protected] classes]# tree
.
└── org
    └── apache
        └── hadoop
            └── examples
                ├── WordCount.class
                ├── WordCount$IntSumReducer.class
                └── WordCount$TokenizerMapper.class

4 directories, 3 files
[[email protected] classes]# jar -cvf wordcount.jar ./*
added manifest
adding: org/(in = 0) (out= 0)(stored 0%)
adding: org/apache/(in = 0) (out= 0)(stored 0%)
adding: org/apache/hadoop/(in = 0) (out= 0)(stored 0%)
adding: org/apache/hadoop/examples/(in = 0) (out= 0)(stored 0%)
adding: org/apache/hadoop/examples/WordCount$TokenizerMapper.class(in = 1790) (out= 764)(deflated 57%)
adding: org/apache/hadoop/examples/WordCount$IntSumReducer.class(in = 1793) (out= 749)(deflated 58%)
adding: org/apache/hadoop/examples/WordCount.class(in = 1988) (out= 1050)(deflated 47%)
[[email protected] classes]# /home/hadoop-2.7.4/bin/hadoop jar /home/classes/wordcount.jar org.apache.hadoop.examples.WordCount /hdfs-input.txt /result-package
17/09/02 02:20:41 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.80:8032
17/09/02 02:20:43 INFO input.FileInputFormat: Total input paths to process : 1
17/09/02 02:20:43 INFO mapreduce.JobSubmitter: number of splits:1
17/09/02 02:20:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1504320356950_0011
17/09/02 02:20:43 INFO impl.YarnClientImpl: Submitted application application_1504320356950_0011
17/09/02 02:20:43 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1504320356950_0011/
17/09/02 02:20:43 INFO mapreduce.Job: Running job: job_1504320356950_0011
17/09/02 02:20:51 INFO mapreduce.Job: Job job_1504320356950_0011 running in uber mode : false
17/09/02 02:20:51 INFO mapreduce.Job:  map 0% reduce 0%
17/09/02 02:20:58 INFO mapreduce.Job:  map 100% reduce 0%
17/09/02 02:21:05 INFO mapreduce.Job:  map 100% reduce 100%
17/09/02 02:21:06 INFO mapreduce.Job: Job job_1504320356950_0011 completed successfully
17/09/02 02:21:06 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=118
        FILE: Number of bytes written=241857
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=174
        HDFS: Number of bytes written=76
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3828
        Total time spent by all reduces in occupied slots (ms)=4312
        Total time spent by all map tasks (ms)=3828
        Total time spent by all reduce tasks (ms)=4312
        Total vcore-milliseconds taken by all map tasks=3828
        Total vcore-milliseconds taken by all reduce tasks=4312
        Total megabyte-milliseconds taken by all map tasks=3919872
        Total megabyte-milliseconds taken by all reduce tasks=4415488
    Map-Reduce Framework
        Map input records=6
        Map output records=12
        Map output bytes=118
        Map output materialized bytes=118
        Input split bytes=98
        Combine input records=12
        Combine output records=9
        Reduce input groups=9
        Reduce shuffle bytes=118
        Reduce input records=9
        Reduce output records=9
        Spilled Records=18
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=186
        CPU time spent (ms)=1200
        Physical memory (bytes) snapshot=297316352
        Virtual memory (bytes) snapshot=4159815680
        Total committed heap usage (bytes)=139595776
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=76
    File Output Format Counters 
        Bytes Written=76
[[email protected] classes]# /home/hadoop-2.7.4/bin/hadoop fs -ls /
Found 4 items
-rw-r--r--   2 root supergroup         76 2017-09-02 00:57 /hdfs-input.txt
drwxr-xr-x   - root supergroup          0 2017-09-02 02:21 /result-package
drwxr-xr-x   - root supergroup          0 2017-09-02 02:12 /result-self-compile
drwx------   - root supergroup          0 2017-09-02 02:11 /tmp
[[email protected] classes]# 
[[email protected] classes]#

為啥要刪除package，就是因為有包路徑的時候調用方式就要 xxx.xxxxx.xxx來執行，而且打包的時候就不能只打class了，目錄結構也要一並打進去

同理，自己寫的代碼也可按照這個方式執行

Linux鞏固記錄（5） hadoop 2.7.4下自己編譯代碼並運行MapReduce程序

parser mod pill self add let tokenize org cto 程序代碼為 ~\hadoop-2.7.4\share\hadoop\mapreduce\sources\hadoop-mapreduce-examples-2.7.4-sourc

Linux鞏固記錄（3） hadoop 2.7.4 環境搭建

修改 spa efault ram 是否 ado rmi down pan 由於要近期使用hadoop等進行相關任務執行，操作linux時候就多了以前只在linux上配置J2EE項目執行環境，無非配置下jdk，部署tomcat，再通過docker或者jenkins自動部署

Linux鞏固記錄（1） java項目的編譯和執行

mce frame cati readfile 知識 4.3 sse apach ast 由於要近期使用hadoop等進行相關任務執行，操作linux時候就多了以前只在linux上配置J2EE項目執行環境，無非配置下jdk，部署tomcat，再通過docker或者jenk

Linux鞏固記錄（1） J2EE開發環境搭建及網絡配置

version 環境 com sco 由於 lin spa node 開發環境由於要近期使用hadoop等進行相關任務執行，操作linux時候就多了以前只在linux上配置J2EE項目執行環境，無非配置下jdk，部署tomcat，再通過docker或者jenkins自動

Linux鞏固記錄（6） Hbase環境準備-zookeeper安裝

this 分析時間同步 all zone direct .org def ech Hbase是運行在hadoop之上，所以請參考第3篇文章搭建好一個master，兩個slave的hadoop環境，我采用的版本為hadoop2.7.4 不了解Hbase的同學可以參考下這篇文

linux學習記錄（一）

設備 ima 權限 logs spf style www. 塊設備 alt 1、各種顏色文件的含義黃色表示設備文件灰色表示其它文件白色表示普通文件綠色表示可執行文件；紅色表示壓縮文件；淺藍色表示鏈接文件；灰色表示其它文件；紅色閃爍表示鏈接的文件有問題了；

linux audit審計（5）--audit規則配置

類型 https 不存在 avi mit linux. 文件監控 usr RM audit可以配置規則，這個規則主要是給內核模塊下發的，內核audit模塊會按照這個規則獲取審計信息，發送給auditd來記錄日誌。規則類型可分為： 1、控制規則：控制audit系統的

Linux自學筆記（5）：Linux基礎命令

linux 基礎命令基礎命令：date：date --help用法：date [選項]... [+格式]　或：date [-u|--utc|--universal] [MMDDhhmm[[CC]YY][-ss]]date [MMDDhhmm[[cc]yy][-ss]]：設置MM：月DD：日hh：時m

Linux驅動開發（5）——生成裝置節點

項裝置可以說是對一部分字元裝置的封裝，還有一部分不好歸類驅動也歸到雜項裝置雜項裝置初始化部分原始檔“drivers/char/ misc.c”，這一部分通過 Makefile可知，是強制編譯的。雜項設備註冊標頭檔案include/linux/miscdevice

linux學習筆記（5）：檔案許可權

檔案許可權 1.檔案許可權存在的意義系統最底層安全設定方法之一保證檔案可以被可用的使用者做相應操作 2.檔案許可權的檢視 ls -l file ls -ld dir ll file ll -d dir 3.檔案許可權的讀取 - |rw-rw-r--|

大資料晉級之路（5）Hadoop，Spark，Storm綜合比較

大資料框架：Spark vs Hadoop vs Storm 目錄 Hadoop Spark Storm 大資料時代，TB級甚至PB級資料已經超過單機尺度的資料處理，分散式處理系統應運而生。知識預熱「專治不明覺厲」之“大資料

Linux學習筆記（5）磁碟分割槽（parted）

Linux學習筆記（5）磁碟分割槽（parted）演示：（1）parted /dev/sdb（可以使用help來檢視命令詳細描述）　　（2）p 列出當前磁碟分割槽資訊,可以看出這裡還沒有分割槽，所以下面沒有任何分割槽資訊　　（3）mklabel 建立磁碟標籤，選擇gpt格式（這裡會把改

Linux作業系統入門（5）

系統的日誌管理 rsyslog的管理 /var/log/messages 服務資訊日誌 /var/log/secure 系統登陸日誌 /var/log/cron 定時任務日誌 /var/log/maillog 郵件日誌 /var/log/boot/log 系統啟動日誌

大資料學習初級入門教程（一） —— Hadoop 2.x 的安裝、啟動和測試

大資料最基礎的就是資料的儲存和計算，而 Hadoop 就是為儲存和計算而生，是最基礎的大資料處理工具。這篇簡單寫寫 Hadoop 2.x 的安裝，啟動和測試。一、準備環境大資料環境的部署，一般都是叢集，機器數量為奇數，這裡以 5 臺機器為例，作業系統為 CentOS 6.9_x64；

LeetCode學習記錄（5）----每日溫度

根據每日氣溫列表，請重新生成一個列表，對應位置的輸入是你需要再等待多久溫度才會升高的天數。如果之後都不會升高，請輸入 0 來代替。例如，給定一個列表 temperatures = [73, 74, 75, 71, 69, 72, 76, 73]，你的輸出應該是 [1,

Thymeleaf學習記錄（5）--運算及表單

Thymeleaf文字及預算：字面文字文字：'one text'，'Another one!'，... 號碼文字：0，34，3.0，12.3，... 布林文字：true，false 空字面： null 文字標記：one，sometext，m

Linux使用記錄（三）

Linux使用記錄（三） ubuntu 16.04.1 連線雲伺服器更新apt-get sudo apt-get update 安裝openssh sudo apt-get install openssh-server 啟動ssh並檢視 /etc/ini

kubernetes學習記錄（5）——服務發現機制與Cluster DNS的安裝(無CA認證版)

服務發現機制 Kubernetes提供了兩種發現Service的方法： 1.環境變數當Pod執行的時候，Kubernetes會將之前存在的Service的資訊通過環境變數寫到Pod中。這種方法要求Pod必須要在Service之後啟動。在Ser

openshift/origin學習記錄（5）——新增Template（模板）並基於模板部署應用

學習資料來源於官方英文文件與《開源容器雲OpenShift》一書，因為剛開始學習，不確定部落格的正確性，以下內容僅供參考。新增模板為了滿足使用者對複雜應用部署的需求，提供應用部署的效率，openshift引入了應用部署模板（Template）的概

10天Hadoop快速突擊（5）——Hadoop I/O操作

Hadoop IO操作意義Hadoop自帶一套用於I/O的原子性的操作（不會被執行緒排程機制打斷，一直到結束，中間不會有任何context switch）特點基於保障海量資料集的完整性和壓縮性 Hadoop提供了一些用於開發分散式系統的API（一些序列化操作+基於磁碟的底層資

Linux鞏固記錄（5） hadoop 2.7.4下自己編譯代碼並運行MapReduce程序

相關推薦