1. 程式人生 > >Python學習——struct模組的pack、unpack示例

Python學習——struct模組的pack、unpack示例

               

import struct

pack、unpack、pack_into、unpack_from

[python] view plain copy print?
  1. # ref: http://blog.csdn.net/JGood/archive/2009/06/22/4290158.aspx
  2. import struct  
  3. #pack - unpack
  4. print
  5. print'===== pack - unpack ====='
  6. str = struct.pack("ii"20400)  
  7. print'str:', str  
  8. print'len(str):', len(str) # len(str): 8 
  9. a1, a2 = struct.unpack("ii"
    , str)  
  10. print"a1:", a1  # a1: 20
  11. print"a2:", a2  # a2: 400
  12. print'struct.calcsize:', struct.calcsize("ii"# struct.calcsize: 8
  13. #unpack
  14. print
  15. print'===== unpack ====='
  16. string = 'test astring'
  17. format = '5s 4x 3s'
  18. print struct.unpack(format, string) # ('test ', 'ing')
  19. string = 'he is not very happy'
  20. format = '2s 1x 2s 5x 4s 1x 5s'
  21. print struct.unpack(format, string) # ('he', 'is', 'very', 'happy')
  22. #pack
  23. print
  24. print'===== pack ====='
  25. a = 20
  26. b = 400
  27. str = struct.pack("ii", a, b)  
  28. print'length:', len(str) #length: 8
  29. print str  
  30. print repr(str) # '/x14/x00/x00/x00/x90/x01/x00/x00'
  31. #pack_into - unpack_from
  32. print
  33. print'===== pack_into - unpack_from ====='
  34. from ctypes 
    import create_string_buffer  
  35. buf = create_string_buffer(12)  
  36. print repr(buf.raw)  
  37. struct.pack_into("iii", buf, 012, -1)  
  38. print repr(buf.raw)  
  39. print struct.unpack_from("iii", buf, 0)  
# ref: http://blog.csdn.net/JGood/archive/2009/06/22/4290158.aspximport struct#pack - unpackprintprint '===== pack - unpack ====='str = struct.pack("ii", 20, 400)print 'str:', strprint 'len(str):', len(str) # len(str): 8 a1, a2 = struct.unpack("ii", str)print "a1:", a1  # a1: 20print "a2:", a2  # a2: 400print 'struct.calcsize:', struct.calcsize("ii") # struct.calcsize: 8#unpackprintprint '===== unpack ====='string = 'test astring'format = '5s 4x 3s'print struct.unpack(format, string) # ('test ', 'ing')string = 'he is not very happy'format = '2s 1x 2s 5x 4s 1x 5s'print struct.unpack(format, string) # ('he', 'is', 'very', 'happy')#packprintprint '===== pack ====='a = 20b = 400str = struct.pack("ii", a, b)print 'length:', len(str) #length: 8print strprint repr(str) # '/x14/x00/x00/x00/x90/x01/x00/x00'#pack_into - unpack_fromprintprint '===== pack_into - unpack_from ====='from ctypes import create_string_bufferbuf = create_string_buffer(12)print repr(buf.raw)struct.pack_into("iii", buf, 0, 1, 2, -1)print repr(buf.raw)print struct.unpack_from("iii", buf, 0)                                                  

執行結果:

[[email protected] python]$ python struct_pack.py

===== pack - unpack =====str: ?len(str): 8a1: 20a2: 400struct.calcsize: 8

===== unpack =====('test ', 'ing')('he', 'is', 'very', 'happy')

===== pack =====length: 8?'/x14/x00/x00/x00/x90/x01/x00/x00'

===== pack_into - unpack_from ====='/x00/x00/x00/x00/x00/x00/x00/x00/x00/x00/x00/x00''/x01/x00/x00/x00/x02/x00/x00/x00/xff/xff/xff/xff'(1, 2, -1)

==============================================================================

Python是一門非常簡潔的語言,對於資料型別的表示,不像其他語言預定義了許多型別(如:在C#中,光整型就定義了8種)

它只定義了六種基本型別:字串,整數,浮點數,元組(set),列表(array),字典(key/value)

通過這六種資料型別,我們可以完成大部分工作。但當Python需要通過網路與其他的平臺進行互動的時候,必須考慮到將這些資料型別與其他平臺或語言之間的型別進行互相轉換問題。打個比方:C++寫的客戶端傳送一個int型(4位元組)變數的資料到Python寫的伺服器,Python接收到表示這個整數的4個位元組資料,怎麼解析成Python認識的整數呢? Python的標準模組struct就用來解決這個問題。

struct模組的內容不多,也不是太難,下面對其中最常用的方法進行介紹:

1、 struct.packstruct.pack用於將Python的值根據格式符,轉換為字串(因為Python中沒有位元組(Byte)型別,可以把這裡的字串理解為位元組流,或位元組陣列)。其函式原型為:struct.pack(fmt, v1, v2, ...),引數fmt是格式字串,關於格式字串的相關資訊在下面有所介紹。v1, v2, ...表示要轉換的python值。下面的例子將兩個整數轉換為字串(位元組流):

#!/usr/bin/env python#encoding: utf8import sysreload(sys)sys.setdefaultencoding("utf-8")import structa = 20b = 400 str = struct.pack("ii", a, b)print 'length: ', len(str)          # length:  8print str                           # 亂碼: print repr(str)                     # '\x14\x00\x00\x00\x90\x01\x00\x00'

格式符"i"表示轉換為int,'ii'表示有兩個int變數。

進行轉換後的結果長度為8個位元組(int型別佔用4個位元組,兩個int為8個位元組)

可以看到輸出的結果是亂碼,因為結果是二進位制資料,所以顯示為亂碼。

可以使用python的內建函式repr來獲取可識別的字串,其中十六進位制的0x00000014, 0x00001009分別表示20和400。

2、 struct.unpackstruct.unpack做的工作剛好與struct.pack相反,用於將位元組流轉換成python資料型別。它的函式原型為:struct.unpack(fmt, string),該函式返回一個元組。 

下面是一個簡單的例子:

#!/usr/bin/env python#encoding: utf8import sysreload(sys)sys.setdefaultencoding("utf-8")import structa = 20b = 400 # packstr = struct.pack("ii", a, b)print 'length: ', len(str)          # length:  8print str                           # 亂碼: print repr(str)                     # '\x14\x00\x00\x00\x90\x01\x00\x00'# unpackstr2 = struct.unpack("ii", str)print 'length: ', len(str2)          # length:  2print str2                           # (20, 400)print repr(str2)                     # (20, 400)

3、 struct.calcsizestruct.calcsize用於計算格式字串所對應的結果的長度,如:struct.calcsize('ii'),返回8。因為兩個int型別所佔用的長度是8個位元組。

import structprint "len: ", struct.calcsize('i')       # len:  4print "len: ", struct.calcsize('ii')      # len:  8print "len: ", struct.calcsize('f')       # len:  4print "len: ", struct.calcsize('ff')      # len:  8print "len: ", struct.calcsize('s')       # len:  1print "len: ", struct.calcsize('ss')      # len:  2print "len: ", struct.calcsize('d')       # len:  8print "len: ", struct.calcsize('dd')      # len:  16

4、 struct.pack_into、 struct.unpack_from這兩個函式在Python手冊中有所介紹,但沒有給出如何使用的例子。其實它們在實際應用中用的並不多。Google了很久,才找到一個例子,貼出來共享一下:

#!/usr/bin/env python#encoding: utf8import sysreload(sys)sys.setdefaultencoding("utf-8")import structfrom ctypes import create_string_bufferbuf = create_string_buffer(12)print repr(buf.raw)     # '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'struct.pack_into("iii", buf, 0, 1, 2, -1)print repr(buf.raw)     # '\x01\x00\x00\x00\x02\x00\x00\x00\xff\xff\xff\xff'print struct.unpack_from("iii", buf, 0)     # (1, 2, -1)

具體內容請參考Python手冊 struct 模組

struct 型別表

FormatC TypePython typeStandard sizeNotes
xpad byteno value
ccharstring of length 11
bsigned charinteger1(3)
Bunsigned charinteger1(3)
?_Boolbool1(1)
hshortinteger2(3)
Hunsigned shortinteger2(3)
iintinteger4(3)
Iunsigned intinteger4(3)
llonginteger4(3)
Lunsigned longinteger4(3)
qlong longinteger8(2), (3)
Qunsigned long longinteger8(2), (3)
ffloatfloat4(4)
ddoublefloat8(4)
schar[]string1
pchar[]string
Pvoid *integer(5), (3)

Notes:

  1. The '?' conversion code corresponds to the _Bool type defined by C99. If this type is not available, it is simulated using a char. In standard mode, it is always represented by one byte.

    New in version 2.6.

  2. The 'q' and 'Q' conversion codes are available in native mode only if the platform C compiler supports C long long, or, on Windows, __int64. They are always available in standard modes.

    New in version 2.2.

  3. When attempting to pack a non-integer using any of the integer conversion codes, if the non-integer has a __index__() method then that method is called to convert the argument to an integer before packing. If no __index__() method exists, or the call to __index__() raises TypeError, then the __int__() method is tried. However, the use of __int__() is deprecated, and will raise DeprecationWarning.

    Changed in version 2.7: Use of the __index__() method for non-integers is new in 2.7.

    Changed in version 2.7: Prior to version 2.7, not all integer conversion codes would use the __int__() method to convert, and DeprecationWarning was raised only for float arguments.

  4. For the 'f' and 'd' conversion codes, the packed representation uses the IEEE 754 binary32 (for 'f') or binary64 (for 'd') format, regardless of the floating-point format used by the platform.

  5. The 'P' format character is only available for the native byte ordering (selected as the default or with the '@' byte order character). The byte order character '=' chooses to use little- or big-endian ordering based on the host system. The struct module does not interpret this as native ordering, so the 'P' format is not available.

A format character may be preceded by an integral repeat count. For example, the format string '4h' means exactly the same as 'hhhh'.

Whitespace characters between formats are ignored; a count and its format must not contain whitespace though.

For the 's' format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting string always has exactly the specified number of bytes. As a special case, '0s' means a single, empty string (while '0c' means 0 characters).

The 'p' format character encodes a “Pascal string”, meaning a short variable-length string stored in a fixed number of bytes, given by the count. The first byte stored is the length of the string, or 255, whichever is smaller. The bytes of the string follow. If the string passed in to pack() is too long (longer than the count minus 1), only the leading count-1 bytes of the string are stored. If the string is shorter than count-1, it is padded with null bytes so that exactly count bytes in all are used. Note that for unpack(), the 'p' format character consumes count bytes, but that the string returned can never contain more than 255 characters.

For the 'P' format character, the return value is a Python integer or long integer, depending on the size needed to hold a pointer when it has been cast to an integer type. A NULL pointer will always be returned as the Python integer 0. When packing pointer-sized values, Python integer or long integer objects may be used. For example, the Alpha and Merced processors use 64-bit pointer values, meaning a Python long integer will be used to hold the pointer; other platforms use 32-bit pointers and will use a Python integer.

For the '?' format character, the return value is either True or False. When packing, the truth value of the argument object is used. Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be True when unpacking.