[hadoop2.7.1]I/O之SequenceFile最新API程式設計例項（寫入、讀取）

阿新 • • 發佈：2019-02-10

寫操作

根據上一篇的介紹，在hadoop2.x之後，hadoop中的SequenceFile.Writer將會逐漸摒棄大量的createWriter（）過載方法，而整合為更為簡潔的createWriter()方法，除了配置引數外，其他的引數統統使用SequenceFile.Writer.Option來替代，具體有：

新的API裡提供的option引數：

FileOption
FileSystemOption
StreamOption
BufferSizeOption
BlockSizeOption
ReplicationOption
KeyClassOption
ValueClassOption
MetadataOption
ProgressableOption
CompressionOption

這些引數能夠滿足各種不同的需要，引數之間不存在順序關係，這樣減少了程式碼編寫工作量，更為直觀，便於理解，下面先來看看這個方法，後邊將給出一個具體例項。

createWriter

public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf,
                                                    org.apache.hadoop.io.SequenceFile.Writer.Option... opts)
                                                             throws  
IOException

Create a new Writer with the given options.

Parameters:: conf - the configuration to use; opts - the options to create the file with
Returns:: a new Writer
Throws:

權威指南第四版中提供了一個SequenceFileWriteDemo例項：

// cc SequenceFileWriteDemo Writing a SequenceFile
import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;

// vv SequenceFileWriteDemo
public class SequenceFileWriteDemo {
  
  private static final String[] DATA = {
    "One, two, buckle my shoe",
    "Three, four, shut the door",
    "Five, six, pick up sticks",
    "Seven, eight, lay them straight",
    "Nine, ten, a big fat hen"
  };
  
  public static void main(String[] args) throws IOException {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    Path path = new Path(uri);

    IntWritable key = new IntWritable();
    Text value = new Text();
    SequenceFile.Writer writer = null;
    try {
      writer = SequenceFile.createWriter(fs, conf, path,
          key.getClass(), value.getClass());
      
      for (int i = 0; i < 100; i++) {
        key.set(100 - i);
        value.set(DATA[i % DATA.length]);
        System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
        writer.append(key, value);
      }
    } finally {
      IOUtils.closeStream(writer);
    }
  }
}
// ^^ SequenceFileWriteDemo

對於上面例項中的`createWriter()`方法用整合之後的最新的方法來改寫一下，程式碼如下：

package org.apache.hadoop.io;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Writer;
import org.apache.hadoop.io.SequenceFile.Writer.FileOption;
import org.apache.hadoop.io.SequenceFile.Writer.KeyClassOption;
import org.apache.hadoop.io.SequenceFile.Writer.ValueClassOption;
import org.apache.hadoop.io.Text;

public class THT_testSequenceFile2 {

	private static final String[] DATA = { "One, two, buckle my shoe",
			"Three, four, shut the door", "Five, six, pick up sticks",
			"Seven, eight, lay them straight", "Nine, ten, a big fat hen" };

	public static void main(String[] args) throws IOException {
		// String uri = args[0];
		String uri = "file:///D://B.txt";
		Configuration conf = new Configuration();
		Path path = new Path(uri);

		IntWritable key = new IntWritable();
		Text value = new Text();
		SequenceFile.Writer writer = null;
		SequenceFile.Writer.FileOption option1 = (FileOption) Writer.file(path);
		SequenceFile.Writer.KeyClassOption option2 = (KeyClassOption) Writer.keyClass(key.getClass());
		SequenceFile.Writer.ValueClassOption option3 = (ValueClassOption) Writer.valueClass(value.getClass());
		
		try {
			
			writer = SequenceFile.createWriter( conf, option1,option2,option3,Writer.compression(CompressionType.RECORD));
			
			for (int i = 0; i < 10; i++) {
				key.set(1 + i);
				value.set(DATA[i % DATA.length]);
				System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,
						value);
				writer.append(key, value);
			}
		} finally {
			IOUtils.closeStream(writer);
		}
	}
}

執行結果如下：

2015-11-06 22:15:05,027 INFO  compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
[128]	1	One, two, buckle my shoe
[173]	2	Three, four, shut the door
[220]	3	Five, six, pick up sticks
[264]	4	Seven, eight, lay them straight
[314]	5	Nine, ten, a big fat hen
[359]	6	One, two, buckle my shoe
[404]	7	Three, four, shut the door
[451]	8	Five, six, pick up sticks
[495]	9	Seven, eight, lay them straight
[545]	10	Nine, ten, a big fat hen

生成的檔案：

讀操作

新的API裡提供的option引數：

FileOption -表示讀哪個檔案
InputStreamOption
StartOption
LengthOption -按照設定的長度變數來決定讀取的位元組
BufferSizeOption
OnlyHeaderOption

根據最新的API直接上原始碼：

package org.apache.hadoop.io;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Reader;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.util.ReflectionUtils;

public class THT_testSequenceFile3 {

	public static void main(String[] args) throws IOException {
		//String uri = args[0];
		String uri = "file:///D://B.txt";
		Configuration conf = new Configuration();
		Path path = new Path(uri);
		SequenceFile.Reader.Option option1 = Reader.file(path);
		SequenceFile.Reader.Option option2 = Reader.length(174);//這個引數表示讀取的長度
		
		SequenceFile.Reader reader = null;
		try {
			reader = new SequenceFile.Reader(conf,option1,option2);
			Writable key = (Writable) ReflectionUtils.newInstance(
					reader.getKeyClass(), conf);
			Writable value = (Writable) ReflectionUtils.newInstance(
					reader.getValueClass(), conf);
			long position = reader.getPosition();
			while (reader.next(key, value)) {
				String syncSeen = reader.syncSeen() ? "*" : "";
				System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,
						value);
				position = reader.getPosition(); // beginning of next record
			}
		} finally {
			IOUtils.closeStream(reader);
		}
	}
}

我這兒設定了一個讀取長度的引數，只讀到第174個位元組那，所以執行結果如下：

2015-11-06 22:53:00,602 INFO  compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate]
[128]	1	One, two, buckle my shoe
[173]	2	Three, four, shut the door

[hadoop2.7.1]I/O之SequenceFile最新API程式設計例項（寫入、讀取）

寫操作

新的API裡提供的option引數：

createWriter

權威指南第四版中提供了一個SequenceFileWriteDemo例項：

對於上面例項中的`createWriter()`方法用整合之後的最新的方法來改寫一下，程式碼如下：

執行結果如下：

生成的檔案：

讀操作

新的API裡提供的option引數：

根據最新的API直接上原始碼：

我這兒設定了一個讀取長度的引數，只讀到第174個位元組那，所以執行結果如下：

[hadoop2.7.1]I/O之SequenceFile最新API程式設計例項（寫入、讀取）

[hadoop2.7.1]I/O之“泥坯塊”SequenceFile前序知識

2.7.1 元素定位：selenium消息框處理（alert、confirm、prompt）

一起玩"Docker"之1——Ubuntu配置安裝Docker執行環境並安裝（Ubuntu、Centos）映象

Linux時間子系統之八：動態時鐘框架（CONFIG_NO_HZ、tickless）

從 0 開始學習 Linux 系列之「27.Socket 程式設計基礎（TCP，UDP）」

(八) SpringBoot起飛之路-整合Shiro詳細教程（MyBatis、Thymeleaf）

Linux下Hadoop2.7.1集群環境的搭建（超詳細版）

PTA 7-1 暢通工程之局部最小花費問題（35 分）

01_PC單機Spark開發環境搭建_JDK1.8+Spark2.3.1+Hadoop2.7.1

eclipse4.7.0+maven3.3.9+scala2.11.8+spark2.1.0+hadoop2.7.1在ubuntu16裡的wordcount例項

作業系統第六章 1 I/O

高階I/O之readv和writev函式

JavaSE基礎(day19)(1)I/O流(2)執行緒的基本概念和基本操作(3)執行緒的同步機制

Hadoop2.7.1 叢集部署及自動化指令碼

Ubuntu14.04-Hadoop2.7.1-jdk1.7.0安裝偽分散式

hadoop2.7.3原始碼解析之hdfs刪除檔案全流程分析

# 從零開始搭建Hadoop2.7.1的分散式叢集

hadoop2.7.1在vmware上3臺centos7虛擬機器上的完全分散式叢集搭建

7-1 暢通工程之區域性最小花費問題

[hadoop2.7.1]I/O之SequenceFile最新API程式設計例項（寫入、讀取）

寫操作

新的API裡提供的option引數：

createWriter

權威指南第四版中提供了一個SequenceFileWriteDemo例項：

對於上面例項中的createWriter()方法用整合之後的最新的方法來改寫一下，程式碼如下：

執行結果如下：

生成的檔案：

讀操作

新的API裡提供的option引數：

根據最新的API直接上原始碼：

我這兒設定了一個讀取長度的引數，只讀到第174個位元組那，所以執行結果如下：

相關推薦

對於上面例項中的`createWriter()`方法用整合之後的最新的方法來改寫一下，程式碼如下：