1. 程式人生 > >Python學習——struct模塊的pack unpack示例

Python學習——struct模塊的pack unpack示例

cas cast 認識 ++ values ren rep needed 字節數

import struct

pack、unpack、pack_into、unpack_from

運行結果:

[[email protected] python]$ python struct_pack.py

===== pack - unpack =====
str: ?
len(str): 8
a1: 20
a2: 400
struct.calcsize: 8

===== unpack =====
(‘test ‘, ‘ing‘)
(‘he‘, ‘is‘, ‘very‘, ‘happy‘)

===== pack =====
length: 8
?
‘/x14/x00/x00/x00/x90/x01/x00/x00‘

===== pack_into - unpack_from =====
‘/x00/x00/x00/x00/x00/x00/x00/x00/x00/x00/x00/x00‘
‘/x01/x00/x00/x00/x02/x00/x00/x00/xff/xff/xff/xff‘
(1, 2, -1)

==============================================================================

Python是一門非常簡潔的語言,對於數據類型的表示,不像其他語言預定義了許多類型(如:在C#中,光整型就定義了8種)

它只定義了六種基本類型:字符串,整數,浮點數,元組(set),列表(array),字典(key/value)

通過這六種數據類型,我們可以完成大部分工作。但當Python需要通過網絡與其他的平臺進行交互的時候,必須考慮到將這些數據類型與其他平臺或語言之間的類型進行互相轉換問題。打個比方:C++寫的客戶端發送一個int型(4字節)變量的數據到Python寫的服務器,Python接收到表示這個整數的4個字節數據,怎麽解析成Python認識的整數呢? Python的標準模塊struct就用來解決這個問題。

struct模塊的內容不多,也不是太難,下面對其中最常用的方法進行介紹:

1、 struct.pack
struct.pack用於將Python的值根據格式符,轉換為字符串(因為Python中沒有字節(Byte)類型,可以把這裏的字符串理解為字節流,或字節數組)。其函數原型為:struct.pack(fmt, v1, v2, ...),參數fmt是格式字符串,關於格式字符串的相關信息在下面有所介紹。v1, v2, ...表示要轉換的python值。下面的例子將兩個整數轉換為字符串(字節流):

#!/usr/bin/env python
#encoding: utf8

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

import struct

a = 20
b = 400 
str = struct.pack("ii", a, b)
print ‘length: ‘, len(str)          # length:  8
print str                           # 亂碼: 
print repr(str)                     # ‘\x14\x00\x00\x00\x90\x01\x00\x00‘

格式符"i"表示轉換為int,‘ii‘表示有兩個int變量。

進行轉換後的結果長度為8個字節(int類型占用4個字節,兩個int為8個字節)

可以看到輸出的結果是亂碼,因為結果是二進制數據,所以顯示為亂碼。

可以使用python的內置函數repr來獲取可識別的字符串,其中十六進制的0x00000014, 0x00001009分別表示20和400。

2、 struct.unpack
struct.unpack做的工作剛好與struct.pack相反,用於將字節流轉換成python數據類型。它的函數原型為:struct.unpack(fmt, string),該函數返回一個元組。

下面是一個簡單的例子:

#!/usr/bin/env python
#encoding: utf8

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

import struct

a = 20
b = 400 

# pack
str = struct.pack("ii", a, b)
print ‘length: ‘, len(str)          # length:  8
print str                           # 亂碼: 
print repr(str)                     # ‘\x14\x00\x00\x00\x90\x01\x00\x00‘

# unpack
str2 = struct.unpack("ii", str)
print ‘length: ‘, len(str2)          # length:  2
print str2                           # (20, 400)
print repr(str2)                     # (20, 400)

3、 struct.calcsize
struct.calcsize用於計算格式字符串所對應的結果的長度,如:struct.calcsize(‘ii‘),返回8。因為兩個int類型所占用的長度是8個字節。

import struct
print "len: ", struct.calcsize(‘i‘)       # len:  4
print "len: ", struct.calcsize(‘ii‘)      # len:  8
print "len: ", struct.calcsize(‘f‘)       # len:  4
print "len: ", struct.calcsize(‘ff‘)      # len:  8
print "len: ", struct.calcsize(‘s‘)       # len:  1
print "len: ", struct.calcsize(‘ss‘)      # len:  2
print "len: ", struct.calcsize(‘d‘)       # len:  8
print "len: ", struct.calcsize(‘dd‘)      # len:  16

4、 struct.pack_into、 struct.unpack_from
這兩個函數在Python手冊中有所介紹,但沒有給出如何使用的例子。其實它們在實際應用中用的並不多。Google了很久,才找到一個例子,貼出來共享一下:

#!/usr/bin/env python
#encoding: utf8

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

import struct
from ctypes import create_string_buffer

buf = create_string_buffer(12)
print repr(buf.raw)     # ‘\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00‘

struct.pack_into("iii", buf, 0, 1, 2, -1)
print repr(buf.raw)     # ‘\x01\x00\x00\x00\x02\x00\x00\x00\xff\xff\xff\xff‘

print struct.unpack_from("iii", buf, 0)     # (1, 2, -1)

具體內容請參考Python手冊 struct 模塊

Python手冊 struct 模塊:http://docs.python.org/library/struct.html#module-struct


struct 類型表

FormatC TypePython typeStandard sizeNotes
x pad byte no value
c char string of length 1 1
b signed char integer 1 (3)
B unsigned char integer 1 (3)
? _Bool bool 1 (1)
h short integer 2 (3)
H unsigned short integer 2 (3)
i int integer 4 (3)
I unsigned int integer 4 (3)
l long integer 4 (3)
L unsigned long integer 4 (3)
q long long integer 8 (2), (3)
Q unsigned long long integer 8 (2), (3)
f float float 4 (4)
d double float 8 (4)
s char[] string 1
p char[] string
P void * integer (5), (3)

Notes:

  1. The ‘?‘ conversion code corresponds to the _Bool type defined by C99. If this type is not available, it is simulated using a char. In standard mode, it is always represented by one byte.

    New in version 2.6.

  2. The ‘q‘ and ‘Q‘ conversion codes are available in native mode only if the platform C compiler supports C long long, or, on Windows, __int64. They are always available in standard modes.

    New in version 2.2.

  3. When attempting to pack a non-integer using any of the integer conversion codes, if the non-integer has a __index__() method then that method is called to convert the argument to an integer before packing. If no __index__() method exists, or the call to __index__() raises TypeError, then the __int__() method is tried. However, the use of __int__() is deprecated, and will raise DeprecationWarning.

    Changed in version 2.7: Use of the __index__() method for non-integers is new in 2.7.

    Changed in version 2.7: Prior to version 2.7, not all integer conversion codes would use the __int__() method to convert, and DeprecationWarning was raised only for float arguments.

  4. For the ‘f‘ and ‘d‘ conversion codes, the packed representation uses the IEEE 754 binary32 (for ‘f‘) or binary64 (for ‘d‘) format, regardless of the floating-point format used by the platform.

  5. The ‘P‘ format character is only available for the native byte ordering (selected as the default or with the ‘@‘ byte order character). The byte order character ‘=‘ chooses to use little- or big-endian ordering based on the host system. The struct module does not interpret this as native ordering, so the ‘P‘ format is not available.

A format character may be preceded by an integral repeat count. For example, the format string ‘4h‘ means exactly the same as ‘hhhh‘.

Whitespace characters between formats are ignored; a count and its format must not contain whitespace though.

For the ‘s‘ format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, ‘10s‘ means a single 10-byte string, while ‘10c‘ means 10 characters. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting string always has exactly the specified number of bytes. As a special case, ‘0s‘ means a single, empty string (while ‘0c‘ means 0 characters).

The ‘p‘ format character encodes a “Pascal string”, meaning a short variable-length string stored in a fixed number of bytes, given by the count. The first byte stored is the length of the string, or 255, whichever is smaller. The bytes of the string follow. If the string passed in to pack() is too long (longer than the count minus 1), only the leading count-1 bytes of the string are stored. If the string is shorter than count-1, it is padded with null bytes so that exactly count bytes in all are used. Note that for unpack(), the ‘p‘ format character consumes count bytes, but that the string returned can never contain more than 255 characters.

For the ‘P‘ format character, the return value is a Python integer or long integer, depending on the size needed to hold a pointer when it has been cast to an integer type. A NULL pointer will always be returned as the Python integer 0. When packing pointer-sized values, Python integer or long integer objects may be used. For example, the Alpha and Merced processors use 64-bit pointer values, meaning a Python long integer will be used to hold the pointer; other platforms use 32-bit pointers and will use a Python integer.

For the ‘?‘ format character, the return value is either True or False. When packing, the truth value of the argument object is used. Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be True when unpacking.

再分享一下我老師大神的人工智能教程吧。零基礎!通俗易懂!風趣幽默!希望你也加入到我們人工智能的隊伍中來!http://www.captainbed.net

Python學習——struct模塊的pack unpack示例