Hadoop學習記錄（五、hadoop IO操作）

阿新 • • 發佈：2018-12-06

1.壓縮從標準輸入讀取的資料，然後將其寫到標準輸出

通過GzipCodec的StreamCompressor物件對字串“Text”進行壓縮，再使用gunzip從標準輸出中對它進行讀取並解壓縮

public class StreamCompressor {
    public static void main(String[] args) throws Exception {
        String codecClassname = args[0];
        Class<?> codecClass = Class.forName(codecClassname);
        Configuration conf = new Configuration();
        CompressionCodec codec = (CompressionCodec)
                ReflectionUtils.newInstance(codecClass, conf);

        CompressionOutputStream out = codec.createOutputStream(System.out);
        IOUtils.copyBytes(System.in, out, 4096, false);
        out.finish();
    }
}

2.根據副檔名選取codec解壓縮檔案

public class FileDecompressor {

  public static void main(String[] args) throws Exception {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    
    Path inputPath = new Path(uri);
    CompressionCodecFactory factory = new CompressionCodecFactory(conf);
    CompressionCodec codec = factory.getCodec(inputPath);
    if (codec == null) {
      System.err.println("No codec found for " + uri);
      System.exit(1);
    }

    String outputUri =
      CompressionCodecFactory.removeSuffix(uri, codec.getDefaultExtension());

    InputStream in = null;
    OutputStream out = null;
    try {
      in = codec.createInputStream(fs.open(inputPath));
      out = fs.create(new Path(outputUri));
      IOUtils.copyBytes(in, out, conf);
    } finally {
      IOUtils.closeStream(in);
      IOUtils.closeStream(out);
    }
  }
}

3. 使用壓縮池對讀取自標準輸入資料進行壓縮，然後將其寫在標準輸出裡

public class PooledStreamCompressor {
    public static void main(String[] args) throws Exception {
        String codecClassname = args[0];
        Class<?> codecClass = Class.forName(codecClassname);
        Configuration conf = new Configuration();
        CompressionCodec codec = (CompressionCodec)
                ReflectionUtils.newInstance(codecClass, conf);
        Compressor compressor = null;
        try {
            compressor = CodecPool.getCompressor(codec);
            CompressionOutputStream out =
                    codec.createOutputStream(System.out, compressor);
            IOUtils.copyBytes(System.in, out, 4096, false);
            out.finish();
        } finally {
            CodecPool.returnCompressor(compressor);
        }
    }
}

4.在MapReduce中使用壓縮

對輸出進行壓縮

./hadoop jar /tmp/hadoop-1.0-SNAPSHOT.jar HotSearch /input/IAMSinger.txt /output/

5.使用SequenceFile對小檔案進行讀取

public class SequenceFileWriteDemo {
  private static final String[] DATA = {
    "One, two, buckle my shoe",
    "Three, four, shut the door",
    "Five, six, pick up sticks",
    "Seven, eight, lay them straight",
    "Nine, ten, a big fat hen"
  };
  
  public static void main(String[] args) throws IOException {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    Path path = new Path(uri);

    IntWritable key = new IntWritable();
    Text value = new Text();
    SequenceFile.Writer writer = null;
    try {
      writer = SequenceFile.createWriter(fs, conf, path,
          key.getClass(), value.getClass());
      
      for (int i = 0; i < 100; i++) {
        key.set(100 - i);
        value.set(DATA[i % DATA.length]);
        System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
        writer.append(key, value);
      }
    } finally {
      IOUtils.closeStream(writer);
    }
  }
}

public class SequenceFileReadDemo {
  public static void main(String[] args) throws IOException {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    Path path = new Path(uri);

    SequenceFile.Reader reader = null;
    try {
      reader = new SequenceFile.Reader(fs, path, conf);
      Writable key = (Writable)
        ReflectionUtils.newInstance(reader.getKeyClass(), conf);
      Writable value = (Writable)
        ReflectionUtils.newInstance(reader.getValueClass(), conf);
      long position = reader.getPosition();
      while (reader.next(key, value)) {
        String syncSeen = reader.syncSeen() ? "*" : "";
        System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key, value);
        position = reader.getPosition(); // beginning of next record
      }
    } finally {
      IOUtils.closeStream(reader);
    }
  }
}

Hadoop學習記錄（五、hadoop IO操作）

1.壓縮從標準輸入讀取的資料，然後將其寫到標準輸出通過GzipCodec的StreamCompressor物件對字串“Text”進行壓縮，再使用gunzip從標準輸出中對它進行讀取並解壓縮 public class StreamCompressor { public static

Hadoop學習記錄（四、hadoop實現檔案操作）

1.從Hadoop URL讀取資料類似cat命令 public class URLCat { static{ URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); }

Hadoop學習記錄（二、hdfs shell命令）

在/usr/local/hadoop-2.9.2/bin目錄下執行命令 1.檢視根目錄： ./hdfs dfs -ls / 2.檔案上傳：上傳到根目錄 ./hdfs dfs -put /tmp/test.txt / 3.檢視檔案內容 ./hdfs df

Hadoop學習記錄（一、Hadoop叢集的搭建）

參考：http://www.zuidemo.com/filePreview/pdfFilePreview/11202並進行補充 1.新建七個centos7系統的虛擬機器，分別命名為cluster1,cluster2等。關閉防火牆。 2.七臺主機都修改host檔案 vi /etc/host

Hadoop學習記錄（三、MapReduce）

1.將一個日誌檔案上傳到hdfs上 2. 編寫mapReduce程式碼 2.1新建一個maven專案，新增依賴 <dependencies> <dependency> <groupId>

Hadoop學習記錄（七、MapReduce檔案分解與合成）

1.將若干個小檔案打包成順序檔案 public class SmallFilesToSequenceFileConverter extends Configured implements Tool { static class SequenceFileMapper

Hadoop學習記錄（六、MapReduce測試）

1.MRUnit進行單元測試加入依賴 <dependency> <groupId>org.apache.mrunit</groupId> <artifactId>mrunit&l

spark學習記錄（五、Spark基於資源排程管理器的提交模式）

一、Standalone（Spark自帶） 1.1 Standalone-client模式提交命令： ./spark-submit --master spark://hadoop1:7077 --class org.apache.spark.examples.Spar

HBase權威指南學習記錄（五、hbase與MapReduce整合）

新增依賴： <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifact

Storm學習記錄（三、Storm叢集搭建）

一、單機搭建 1.上傳並解壓jar包 2.在storm目錄下建立logs目錄，以儲存程式執行時的資訊 mkdir logs 3.在bin目錄下執行命令，啟動zookeeper ./storm dev-zookeeper >> ../logs/dev-zookeeper

spark學習記錄（三、spark叢集搭建）

一、安裝spark 1.上傳壓縮包並解壓 2.在conf目錄下配置slaves cp slaves.template slaves //在master機上配置worker節點 hadoop2 hadoop3 3.配置spark-env.sh cp spark-env.sh.t

hadoop學習筆記（五）：HBase體系結構和數據模型

ems 服務器端們的 code 修改保存重新 table lpad 1. HBase體系結構一個完整分布式的HBase的組成示意圖如下，後面我們再詳細談其工作原理。 1)Client 包含訪問HBase的接口並維護cache來加快對HBase的訪問。 2）Zooke

大資料Hadoop學習筆記（五）

分散式部署本地模式Local Mode 分散式Distribute Mode 偽分散式一臺機器執行所有的守護程序從節點DN和NM只有一個完全分散式

SpringBoot學習記錄（五）——Servlet、Filter、Listener配置

SpringBoot中沒有了web.xml檔案，但是有時候需要使用Servlet，Listener，Filter，則SpringBoot中有2種方式： 1、Servlet3中的註解@WebSer

hadoop學習筆記（五） java heap space報錯

Java heap space 報錯實驗過程中，執行map過程中，出現task failed，報錯為java heap space，原因是jvm的記憶體太小了，無法達到要求，修改方法一般是改程序序，減小程式消耗的記憶體，還有就是增大datanode的jvm記

Hadoop學習筆記（Day1：Hadoop家族體系、權威指南1、2.4章）

<本系列文章主要供自己學習Hadoop技術筆記用> 1）Hadoop家族體系 Hadoop家族成員概述這篇文章簡明扼要地介紹了Hadoop家族各個成員的功能。這篇文章除了介紹家族成員外，還介紹了其學習路線圖。 2）Hadoop權威指南 2.4.1

BigData 學習記錄（五）

merge 而且 seq 運行時間 big 存儲位置完成 setup 其中 MR(MapReduce)運行過程 client程序--》提交job至JobTracker--》分配job ID--》JobTracker檢查輸入文件存在，輸出文件不存在--》進行輸入分片--

Linux命令學習記錄（五）

oss file process itl alt 技術 bsp image 運行命令 ln命令：創建鏈接（快捷方式）每個文件有一個標示號碼，就是inode；硬鏈接原理是，使鏈接的兩個文件共享同樣的文件內容，即同樣的inode。缺陷：只能創建文件的硬鏈接，不能創建目錄的硬鏈接

python學習記錄（五）

特定 bar 打印字符 toolbar 元組 pytho www san 數字 20180829--https://www.cnblogs.com/fnng/archive/2013/04/20/3032563.html 字典字典的使用現實中的字段及在Python中

第一篇隨記:學習WAMP中最基礎的JDBC連線操作記錄（ Statement、PreparedStatement和CallableStatement）

用Statement實現資料庫連線： <%@ page contentType="text/html" pageEncoding="UTF-8" %> <%@ page import="java.sql.*" %> <html> <

Hadoop學習記錄（五、hadoop IO操作）

1.壓縮從標準輸入讀取的資料，然後將其寫到標準輸出

2.根據副檔名選取codec解壓縮檔案

3. 使用壓縮池對讀取自標準輸入資料進行壓縮，然後將其寫在標準輸出裡

4.在MapReduce中使用壓縮

5.使用SequenceFile對小檔案進行讀取

相關推薦