[Python]_[初級]_[使用struct庫對二進位制檔案進行讀寫]
場景
1.要分析或生成一個二進位制檔案時, Python使用 struct庫來轉換或分析二進位制資料. 當然使用C++或者Java來分析也可以, 但對於指令碼語言Python來說, 編寫資料和除錯時間都沒Python快. 所以Python其實是比較正確的選擇.
2.如果需要傳輸socket資料, 在定義好資料格式時自然也是使用stuct.pack和unpack來打包資料和分析資料.
說明
1.Python作為一門通用型指令碼語言, 能像Java,C++那樣處理通用型任務,比如讀寫二進位制或文字檔案. 讀寫文字檔案很容易, 使用File Object即可操作, 讀取字串一般使用file.readline 或者 file.readlines
2.Python使用字串物件儲存二進位制資料. 當呼叫file.read(n)
時, 返回一個字串物件, 這個字串物件類似於C++的 std::string
或者 char buf[]
, 都可以儲存任何位元組資料; 當需要對string物件,即位元組資料進行操作時, 就需要unpack來進行資料轉換, 比如把4個位元組轉換為數值, 把某部分的位元組轉換為str()字串等等. Python的file物件類似於 C語言的FILE物件, 差不多有類似的對應函式.
3.對於struct.pack和unpack的說明, 其中 pack其實是對C結構體進行打包, 並進行本機預設地址對齊, 本地位元組序. 比如 “bci” 的大小是8. 所以儘量使用i, 而不是b,c這些單位元組, 因為被對齊後大小很難計算.
struct.pack(fmt, v1, v2, ...)
Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly
struct.unpack(fmt, string)
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format . The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
4.Byte Order, Size, and Alignment
Character | Byte order | Size | Alignment |
---|---|---|---|
@ | native | native | native |
= | native | standard | none |
< | little-endian | standard | none |
> | big-endian | standard | none |
! | network | (= big-endian) | standard |
5.Format Characters
Format | C Type | Python type | Standard size | Notes |
---|---|---|---|---|
x | pad | byte | no | value |
c | char | string of length | 1 | 1 |
b | signed char | integer | 1 | (3) |
B | unsigned char | integer | 1 | (3) |
? | _Bool | bool | 1 | (1) |
h | short | integer | 2 | (3) |
H | unsigned short | integer | 2 | (3) |
i | int | integer | 4 | (3) |
I | unsigned int | integer | 4 | (3) |
l | long | integer | 4 | (3) |
L | unsigned long | integer | 4 | (3) |
q | long long | integer | 8 | (2), (3) |
Q | unsigned long long | integer | 8 | (2), (3) |
f | float | float | 4 | (4) |
d | double | float | 8 | (4) |
s | char[] | string | ||
p | char[] | string | ||
P | void * | integer | (5), (3) |
例子
#! encoding=utf8
import sys
import os
import io
from StringIO import StringIO
from struct import unpack
from struct import pack
def TestWriter(path):
f = open(path,"wb")
# write png header
header = pack('BBBB',0x89,0x50,0x4E,0x47)
f.write(header)
one_str = "string中文"
one_char1 = 0
one_char2 = ord('t')
one_int = 50
# we need calc string length
str_len = len(one_str)
format = '%dsiii' % (str_len)
body = pack(format,one_str,one_char1,one_char2,one_int)
print len(body)
f.write(body)
f.close()
def TestReader(path):
f = open(path,"rb")
header = f.read(1)
a, = unpack("B",header)
png = f.read(3)
print png
name=''
n=f.read(1)
while unpack('<b',n)[0]!=0:
name=name+str(n)
n=f.read(1)
print name
# three bytes. == 0
f.seek(3,io.SEEK_CUR)
left = f.read(8)
a,b = unpack("ii",left)
print "%d:%d" % (a,b)
f.close()
if __name__ == '__main__':
path = ""
if len(sys.argv) > 1:
path = sys.argv[1];
else:
path = "temp.png"
TestWriter(path);
TestReader(path);
# os.remove(path)
輸出
24
PNG
string中文
116:50
參考
struct – Interpret strings as packed binary data
Python使用struct處理二進位制
File Objects