numpy教程：基本輸入輸出和檔案輸入輸出Input and output

阿新 • • 發佈：2019-01-28

基本輸入輸出和檔案輸入輸出

檔名和檔案物件

本節介紹所舉的例子都是傳遞的檔名，也可以傳遞已經開啟的檔案物件.

例如對於load和save函式來說，如果使用檔案物件的話，可以將多個數組儲存到一個npy檔案中：

>>> a = np.arange(8)
>>> b = np.add.accumulate(a)
>>> c = a + b
>>> f = file("result.npy", "wb")
>>> np.save(f, a) # 順序將a,b,c儲存進檔案物件f
>>> np.save( 
f, b)
>>> np.save(f, c)
>>> f.close()
>>> f = file("result.npy", "rb")
>>> np.load(f) # 順序從檔案物件f中讀取內容
array([0, 1, 2, 3, 4, 5, 6, 7])
>>> np.load(f)
array([ 0,  1,  3,  6, 10, 15, 21, 28])
>>> np.load(f)
array([ 0,  2,  5,  9, 14, 20, 27, 35])

檔案物件寫入時的注意事項

numpy.savetxt(fname, X, fmt=’%.18e’, delimiter=’ ‘, newline=’\n’, header=’‘, footer=’‘, comments=’#‘)
Save an array to a text file.

np.savetxt(輸出檔名,矩陣名)

輸出檔名應為二進位制寫入：

doc_word_mat_file = open('./filename.txt', 'wb')

否則出錯：

savetxt(doc_word_mat_file, doc_word_mat) ... fh.write(asbytes(format % tuple(row) + newline))
TypeError: must be str, not bytes

所以推薦不要使用檔案物件寫入，用檔名寫入

numpy陣列輸出選項設定

在螢幕上輸出陣列：print(mat1)

Text formatting options

`set_printoptions`([precision, threshold, ...])	Set printing options.
Return the current print options.
Set a Python function to be used when pretty printing arrays.

numpy陣列列印效果設定

多維陣列強制列印全部輸出

如果一個數組用來列印太大了，NumPy自動省略中間部分而只打印角落。禁用NumPy的這種行為並強制列印整個陣列，你可以設定printoptions引數來更改列印選項。

np.set_printoptions(threshold=np.NaN)

threshold : int, optional Total number of array elements which trigger summarizationrather than full repr (default 1000).

使用set_printoptions設定輸出精度

np.set_printoptions(precision=3)print(x)# [ 0.078  0.48   0.413  0.83   0.776  0.102  0.513  0.462  0.335  0.712]

但是怎麼沒辦法輸出原本資料的最精確精度，如有的資料是10位小數，有的是8位，輸出不好控制precision為多少好。精度過高就會出錯。

這時可以使用array.astype(str)轉換成字串輸出。如

[['40.731354990929475' '-74.00363118575608']
['40.731508' '-74.0031859561163']]

`suppress`消除小的數字使用科學記數法

y=np.array([1.5e-10,1.5,1500])print(y)# [  1.500e-10   1.500e+00   1.500e+03]
np.set_printoptions(suppress=True)print(y)# [    0.      1.5  1500. ]

在本地應用列印選項中，你可以使用 contextmanager ...

formatter引數允許你指定一個格式為每個型別的函式(注意結尾補0了)

np.set_printoptions(formatter={'float':'{: 0.3f}'.format})print(x)

[0.0780.4800.4130.8300.7760.1020.5130.4620.3350.712]

奇怪的方式

print(np.char.mod('%4.2f', eigen_value))

[Pretty-printing of numpy.array][numpy.array的效果]

numpy輸出精度區域性控制

@contextmanager
def np_printoptions(*args, **kwargs):
'''
    numpy本地應用列印選項,如：
    with np_printoptions(precision=3, suppress=True):
        print(x1)
    '''
original = np.get_printoptions()
    np.set_printoptions(*args, **kwargs)
    yield
np.set_printoptions(**original)

numpy檔案存取

NumPy提供了多種檔案操作函式方便我們存取陣列內容。

檔案存取的格式：二進位制和文字。二進位制格式的檔案又分為NumPy專用的格式化二進位制型別和無格式型別。

Numpy binary files (NPY, NPZ)

`load`(file[, mmap_mode, allow_pickle, ...])	Load arrays or pickled objects from `.npy`, `.npz` or pickled files.
`save`(file, arr[, allow_pickle, fix_imports])	Save an array to a binary file in NumPy `.npy` format.
`savez`(file, args, *kwds)	Save several arrays into a single file in uncompressed `.npz` format.
Save several arrays into a single file in compressed `.npz` format.

numpy.load和numpy.save函式（推薦在不需要檢視儲存資料的情況下使用）

以NumPy專用的二進位制型別儲存資料，這兩個函式會自動處理元素型別和shape等資訊，使用它們讀寫陣列就方便多了，但是numpy.save輸出的檔案很難和其它語言編寫的程式讀入：

>>> np.save("a.npy", a)
>>> c = np.load( "a.npy" )
>>> c
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Note:

1. 檔案要儲存為.npy檔案型別，否則會出錯

2. 儲存為numpy專用二進位制格式後，就不能用notepad++開啟（亂碼）看了，這是相對tofile內建函式不好的一點

numpy.savez函式

如果你想將多個數組儲存到一個檔案中的話，可以使用numpy.savez函式。savez函式的第一個引數是檔名，其後的引數都是需要儲存的陣列，也可以使用關鍵字引數為陣列起一個名字，非關鍵字引數傳遞的陣列會自動起名為arr_0, arr_1, ...。savez函式輸出的是一個壓縮檔案(副檔名為npz)，其中每個檔案都是一個save函式儲存的npy檔案，檔名對應於陣列名。load函式自動識別npz檔案，並且返回一個類似於字典的物件，可以通過陣列名作為關鍵字獲取陣列的內容：

>>> a = np.array([[1,2,3],[4,5,6]])
>>> b = np.arange(0, 1.0, 0.1)
>>> c = np.sin(b)
>>> np.savez("result.npz", a, b, sin_array = c)
>>> r = np.load("result.npz")
>>> r["arr_0"] # 陣列a
array([[1, 2, 3],
       [4, 5, 6]])
>>> r["arr_1"] # 陣列b
array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])
>>> r["sin_array"] # 陣列c
array([ 0.        ,  0.09983342,  0.19866933,  0.29552021,  0.38941834,
        0.47942554,  0.56464247,  0.64421769,  0.71735609,  0.78332691])

如果你用解壓軟體開啟result.npz檔案的話，會發現其中有三個檔案：arr_0.npy， arr_1.npy， sin_array.npy，其中分別儲存著陣列a, b, c的內容。

numpy讀取文字檔案Text files

`loadtxt`(fname[, dtype, comments, delimiter, ...])	Load data from a text file.
`savetxt`(fname, X[, fmt, delimiter, newline, ...])	Save an array to a text file.
`genfromtxt`(fname[, dtype, comments, ...])	Load data from a text file, with missing values handled as specified.
`fromregex`(file, regexp, dtype)	Construct an array from a text file, using regular expression parsing.
`fromstring`(string[, dtype, count, sep])	A new 1-D array initialized from raw binary or text data in a string.
Write array to a file as text or binary (default).
Return the array as a (possibly nested) list.

numpy.savetxt和numpy.loadtxt（推薦需要檢視儲存資料時使用）

使用numpy.savetxt和numpy.loadtxt可以讀寫1維和2維的陣列：

np.loadtxt(FILENAME, dtype=int, delimiter=' ')

numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)

使用結構陣列讀入檔案
persontype = np.dtype({'names':['name', 'age', 'weight', 'height'],
'formats':['S32','i', 'f', 'f']})
data = np.loadtxt(f, delimiter=",", dtype=persontype)

np.savetxt("a.txt",a, fmt="%d",delimiter=",")

>>> a = np.arange(0,12,0.5).reshape(4,-1)
>>> np.savetxt("a.txt", a) # 預設按照'%.18e'格式儲存資料，以空格分隔
>>> np.loadtxt("a.txt")

>>> np.savetxt("a.txt", a, fmt="%d", delimiter=",") #改為儲存為整數，以逗號分隔
>>> np.loadtxt("a.txt",delimiter=",") # 讀入的時候也需要指定逗號分隔
array([[  0.,   0.,   1.,   1.,   2.,   2.],
       [  3.,   3.,   4.,   4.,   5.,   5.],
       [  6.,   6.,   7.,   7.,   8.,   8.],
       [  9.,   9.,  10.,  10.,  11.,  11.]])

Note:savetxt預設按照'%.18e'格式儲存資料，可以修改儲存格式為‘%.8f'(小數點後保留8位的浮點數)， ’%d'(整數)等等

np.savetxt儲存中文字串陣列

實際上是不可以的，因為預設是wb格式儲存，這樣就是儲存為bytes，py3中的str又是unicode，這樣就算儲存下來了，也並看不了儲存下來的中文是什麼。

如：

s = np.array([['工', '1'], ['q', '1']])
print(s)
s = np.char.encode(s,'utf-8')
np.savetxt('/tmp/1.txt', s, fmt='%s')

檔案中只會這樣顯示：b'\xe5\xb7\xa5' b'1'
b'q' b'1'

所以有中文的話，而且實在想看檔案中的內容，只有使用普通儲存方法儲存了：with open() as f: f.write(lines)什麼的。

loadtxt出錯

1 numpy.loadtxt讀入的字串總是bytes格式，總是在前面加了一個b

原因：np.loadtxt and np.genfromtxt operate in byte mode, which is the default string type in Python 2. But Python 3 uses unicode, and marks bytestrings with this b. numpy.loadtxt中也聲明瞭：Note that generators should return byte strings for Python 3k.

解決：使用numpy.loadtxt從檔案讀取字串，最好使用這種方式np.loadtxt(filename, dtype=bytes).astype(str)

總結：

載入txt檔案：numpy.loadtxt()/numpy.savetxt()

智慧匯入文字/csv檔案：numpy.genfromtxt()/numpy.recfromcsv()

高速，有效率但numpy特有的二進位制格式：numpy.save()/numpy.load()

2 ValueError: Wrong number of columns at line 78446

原因是資料問題，可能前面的資料都是3列而78446卻是2或4列等等。

檢視資料nl data_test6.txt | grep -C 3 78446

numpy.genfromtxt

import numpy as np
np.genfromtxt('filename', dtype= None)
# array([(1, 2.0, 'buckle_my_shoe'), (3, 4.0, 'margery_door')],
# dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '|S14')])

Raw binary files

`fromfile`(file[, dtype, count, sep])	Construct an array from data in a text or binary file.
Write array to a file as text or binary (default).

tofile和fromfile陣列內建函式（not recommend）

使用陣列的方法函式tofile可以方便地將陣列中資料以二進位制的格式寫進檔案。tofile輸出的資料沒有格式，因此用numpy.fromfile讀回來的時候需要自己格式化資料：

>>> a = np.arange(0,12)
>>> a.shape = 3,4
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a.tofile("a.bin")
>>> b = np.fromfile("a.bin", dtype=np.float) # 按照float型別讀入資料
>>> b # 讀入的資料是錯誤的
array([  2.12199579e-314,   6.36598737e-314,   1.06099790e-313,
         1.48539705e-313,   1.90979621e-313,   2.33419537e-313])
>>> a.dtype # 檢視a的dtype
dtype('int32')
>>> b = np.fromfile("a.bin", dtype=np.int32) # 按照int32型別讀入資料
>>> b # 資料是一維的
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> b.shape = 3, 4 # 按照a的shape修改b的shape
>>> b
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Note:

1. 讀入的時候設定正確的dtype和shape才能保證資料一致。並且tofile函式不管陣列的排列順序是C語言格式的還是Fortran語言格式的，統一使用C語言格式輸出。

2. sep關鍵字引數:此外如果fromfile和tofile函式呼叫時指定了sep關鍵字引數的話，陣列將以文字格式輸入輸出。{這樣就可以通過notepad++開啟檢視, 不過資料是一行顯示，不便於檢視}

user_item_mat.tofile(user_item_mat_filename, sep=' ')

皮皮blog

String formatting

`array2string`(a[, max_line_width, precision, ...])	Return a string representation of an array.
`array_repr`(arr[, max_line_width, precision, ...])	Return the string representation of an array.
`array_str`(a[, max_line_width, precision, ...])	Return a string representation of the data in an array.

Memory mapping files

Create a memory-map to an array stored in a binary file on disk.

Base-n representations

Return the binary representation of the input number as a string.
`base_repr`(number[, base, padding])	Return a string representation of a number in the given base system.

Data sources

A generic data source file (file, http, ftp, ...).

from:http://blog.csdn.net/pipisorry/article/details/39088003