1. 程式人生 > >python對象序列化之pickle

python對象序列化之pickle

led issue tab 模塊 produce tor mpat base back

本片文章主要是對pickle官網的閱讀記錄。

The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file

or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

pickle是python標準模塊之一,不需要再額外安裝。

pickle用來 序列化和反序列化 Python object structure。其實就是一種數據存儲方式,將python的數據結構以特定的形式保存下來。另外,經過pickle序列化後的數據不是human-readable的。

這裏提一下老外對事物的命名習慣,pickle是腌制的意思,那麽對python object的"腌制",其實就是一種數據處理,至於數據處理的規則是什麽,這裏暫時不做進一步介紹。

“Pickling” 就是將有層次結構的python object轉換成字節流;“unpickling” 就是相反的過程。

說明: 如果碰到“Pickling” “serialization”, “marshalling,” or “flattening”,都是表達相同的意思,翻譯成"序列化"就好了;如果單詞前加了un,就翻成“反序列化”。

Warning:The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

不要去序列化 錯誤的或者惡意的 結構化數據,也不要去反序列化 不受信任或未授權的數據源。意思就是“序列化”和“反序列化”要按照pickle模塊的規則來進行。

Data stream format

The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.

pickle使用的數據格式是Python語言特有的。非Python程序可能不能重構 被序列化 的數據。

By default, the pickle data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data.

默認,pickle的序列化數據格式是一種相對緊湊的二進制表示。如果對數據大小有更高要求,可以壓縮 已序列化的數據。

The module pickletools contains tools for analyzing data streams generated by pickle. pickletools source code has extensive comments about opcodes used by pickle protocols.

pickletools包含很多用來解析 已序列化數據的工具。

There are currently 5 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

  • Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
  • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
  • Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This is the default protocol, and the recommended protocol when compatibility with other Python 3 versions is required.
  • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. Refer to PEP 3154 for information about improvements brought by protocol 4.

Note:

Serialization is a more primitive notion than persistence; although pickle reads and writes file objects, it does not handle the issue of naming persistent objects, nor the (even more complicated) issue of concurrent access to persistent objects. The pickle module can transform a complex object into a byte stream and it can transform the byte stream into an object with the same internal structure. Perhaps the most obvious thing to do with these byte streams is to write them onto a file, but it is also conceivable to send them across a network or store them in a database. The shelve module provides a simple interface to pickle and unpickle objects on DBM-style database files.

Module Interface

python對象序列化之pickle