1. 程式人生 > >Go語言三種方式讀取檔案效率對比及原因分析

Go語言三種方式讀取檔案效率對比及原因分析

最近有遇到需要用go讀取大檔案的情況,順路研究了一下go幾種讀取檔案方式的效率。

go幾種常見的檔案io方式

  1. 使用os包內的open和read。

    fi, err := os.Open(path) // 開啟檔案
    buf := make([]byte, 1024)
    n, err := fi.Read(buf)   // 讀取內容
    
  2. 使用buffered io

    fi, err := os.Open(path)
    r := bufio.NewReader(fi)
    buf := make([]byte, 1024)
    n, err := r.Read(buf)
    
  3. 使用ioutil包內的方法

    fi, err :=
    os.Open(path) fd, err := ioutil.ReadAll(fi)

現象(效率對比)

準備了待讀取檔案資訊如下:

total 720912
-rw-r--r--  1 stephen  staff   2.3K Sep 15 11:59 io_demo.go
-rw-r--r--  1 stephen  staff   336M Sep 15 11:59 test.txt

同時io_demo.go檔案中的程式碼如下:

package main

import (
	"bufio"
	"fmt"
	"io"
	"io/ioutil"
	"os"
	"time"
)

func
readRaw(path string) string { start := time.Now() fi, err := os.Open(path) if err != nil { panic(err) } defer fi.Close() defer func() { fi.Close() fmt.Printf("[readRaw] cost time %v \n", time.Now().Sub(start)) }() var data []byte buf := make([]byte, 1024) for { n, err := fi.Read(buf)
if err != nil && err != io.EOF { panic(err) } data = append(data, buf[:n]...) if 0 == n { break } } return string(data) } func readWithBufferIO(path string) string { start := time.Now() fi, err := os.Open(path) if err != nil { panic(err) } defer func() { fi.Close() fmt.Printf("[readWithBufferIO] cost time %v \n", time.Now().Sub(start)) }() r := bufio.NewReader(fi) var data []byte buf := make([]byte, 1024) for { n, err := r.Read(buf) if err != nil && err != io.EOF { panic(err) } if 0 == n { break } data = append(data, buf[:n]...) } return string(data) } func readWithIOUtil(path string) string { start := time.Now() fi, err := os.Open(path) if err != nil { panic(err) } defer func() { fi.Close() fmt.Printf("[readWithIOUtil] cost time %v \n", time.Now().Sub(start)) }() fd, err := ioutil.ReadAll(fi) return string(fd) } func main() { file := "test.txt" readRaw(file) readWithBufferIO(file) readWithIOUtil(file) }

用如上程式碼讀取已準備的檔案,多次測試用時資訊如下(進行了超過10次測試,僅取了兩個結果來說明問題):

[readRaw] cost time 1.490717874s 
[readWithBufferIO] cost time 573.336617ms 
[readWithIOUtil] cost time 379.678285ms 
[readRaw] cost time 1.45133396s 
[readWithBufferIO] cost time 541.944555ms 
[readWithIOUtil] cost time 983.909509ms 

可以看到,毫無疑問使用os包readRaw讀取的方式是最慢的,且相比其他兩種方式要慢很多。但是readWithBufferIO和readWithIOUtil 兩種方式速度的快慢就很難分伯仲了

透過現象看本質

既然得到了這個結論,那麼我們來看看為什麼會這樣。

1. 為什麼bufferIO會比普通read快?

看bufio原始碼

// NewReader returns a new Reader whose buffer has the default size.
func NewReader(rd io.Reader) *Reader {
	return NewReaderSize(rd, defaultBufSize)
}

再看NewReaderSize方法

// NewReaderSize returns a new Reader whose buffer has at least the specified
// size. If the argument io.Reader is already a Reader with large enough
// size, it returns the underlying Reader.
func NewReaderSize(rd io.Reader, size int) *Reader {
	// Is it already a Reader?
	b, ok := rd.(*Reader)
	if ok && len(b.buf) >= size {
		return b
	}
	if size < minReadBufferSize {
		size = minReadBufferSize
	}
	r := new(Reader)
	r.reset(make([]byte, size), rd)
	return r
}	

bufferio預設建立一個大小為4096 byte的緩衝區,它的 read 方法執行一次IO系統呼叫讀取4096byte(4K)大小到緩衝區,此後r.Read(buf)都會從緩衝區中讀。而普通io每次讀/寫操作都會執行系統呼叫,必然會比bufferIO慢很多,畢竟每次系統呼叫都會從執行從使用者態到核心態的切換

2. 為什麼bufferio和ioutil的效率難分伯仲?

來看ioutil原始碼

// MinRead is the minimum slice size passed to a Read call by
// Buffer.ReadFrom. As long as the Buffer has at least MinRead bytes beyond
// what is required to hold the contents of r, ReadFrom will not grow the
// underlying buffer.
const MinRead = 512

// ReadAll reads from r until an error or EOF and returns the data it read.
// A successful call returns err == nil, not err == EOF. Because ReadAll is
// defined to read from src until EOF, it does not treat an EOF from Read
// as an error to be reported.
func ReadAll(r io.Reader) ([]byte, error) {
	return readAll(r, bytes.MinRead)
}

// readAll reads from r until an error or EOF and returns the data it read
// from the internal buffer allocated with a specified capacity.
func readAll(r io.Reader, capacity int64) (b []byte, err error) {
	var buf bytes.Buffer
	// If the buffer overflows, we will get bytes.ErrTooLarge.
	// Return that as an error. Any other panic remains.
	defer func() {
		e := recover()
		if e == nil {
			return
		}
		if panicErr, ok := e.(error); ok && panicErr == bytes.ErrTooLarge {
			err = panicErr
		} else {
			panic(e)
		}
	}()
	if int64(int(capacity)) == capacity {
		buf.Grow(int(capacity))
	}
	_, err = buf.ReadFrom(r)
	return buf.Bytes(), err
}

可以看到,ioutil.ReadAll最後實現的也是一個帶緩衝的IO,且大小在512byte以上,且使用的是bytes.Buffer,可以根據情況動態的增長。但是的Grow時重新分配buf也會帶來一些開銷,所以兩者相比就變成了一個權衡,沒有絕對佔優。

但是ioutil的好處就是方便,ioutil.ReadAll或者ioutil.ReadFile一行程式碼就搞定。