Python 標準庫原始碼分析 namedtuple

Python 原始碼分析 · 發表 2019-05-12 23:23:52

摘要： namedtuple 是一個簡化 tuple 操作的工廠函式，對於普通元組我們在訪問上只能通過遊標的訪問，在表現力上有時候比不上物件。命名的元組例項沒有每個例項的字典，因此它們是輕量級的，並且不需要比常規元組更多的記憶體。假如想計算兩個點之間的距離根據定義：需要...

namedtuple 是一個簡化 tuple 操作的工廠函式，對於普通元組我們在訪問上只能通過遊標的訪問，在表現力上有時候比不上物件。

命名的元組例項沒有每個例項的字典，因此它們是輕量級的，並且不需要比常規元組更多的記憶體。

假如想計算兩個點之間的距離根據定義：

需要兩個點的 x、y 座標，我們可以直接使用元組表示 p1 和 p2 點

>>> import math
>>> 
>>> p1, p2 = (1, 2), (2, 3)
>>> s = math.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2)
>>> 
>>> print(s)
1.4142135623730951
>>>

對於 p1 點的 x 座標使用 p1[0] 表示，對閱讀上有一定的困擾，如果可以使用 p1.x 就語義清晰了。

這個場景就是 namedtuple 的典型應用，讓欄位具有名字，使用 namedtuple 重寫上面例子

>>> import collections
>>> import math
>>> 
>>> Point = collections.namedtuple('Point', ['x', 'y'])
>>> p1, p2 = Point(1, 2), Point(2, 3)
>>> 
>>> s = math.sqrt((p1.x - p2.x)**2 + (p1.y - p2.y)**2)
>>> 
>>> print(s)
1.4142135623730951
>>>

好奇寶寶肯定就會想知道 namedtuple 是如何讓欄位具有名字的，先看看函式的簽名

namedtuple(typename, field_names, *, rename=False, defaults=None,module=None)

第一個和第二引數前面已經使用過了， typename 就是新命名元組的名字，我們最經常的就是模仿的類，所以會使用類的定義風格。 field_names 引數用於定義欄位的名字，除了上面使用 ['x', 'y'] 還可以使用 "x y" 或者 "x, y" ，定義方法選擇自己喜歡的就好。

rename 引數預設是 False ，顧名思義就是重新命名欄位名字，假如我們使用了非法的變數名（比如關鍵字等）會被重新命名成別的名字。

[!DANGER]

這種改變定義的行為是最好不要做，除非你能保證任何人知道這個行為。

defaults 引數可以是 None 或者一個可迭代的值，根據具有預設值的欄位必須在沒有初始值的後面，所以 defaults 提供的預設值都是最右匹配。

>>> from collections import namedtuple
>>> 
>>> Point = namedtuple('Point', "x y z", defaults=[2, 3])
>>> p1 = Point(1)
>>> 
>>> print(p1)
Point(x=1, y=2, z=3)
>>>

如果定義了 module ，則將命名元組的 __module__ 屬性設定為該值。

...
if isinstance(field_names, str):
field_names = field_names.replace(',', ' ').split()
field_names = list(map(str, field_names))
typename = _sys.intern(str(typename))
...

進入函式的第一步先對兩個基本的引數 typename 和 field_names 進行處理。

如果 field_names 是一個字串就 replace 把 , 轉化成空格，再 split 成標準的 list。 list(map(str, field_names)) 保證了 field_names 的每個值都是 str 型別。

_sys.intern 把 typename 註冊到全域性中，可以加快對字串的尋找。

...
if rename:
seen = set()
for index, name in enumerate(field_names):
if (not name.isidentifier()
or _iskeyword(name)
or name.startswith('_')
or name in seen):
field_names[index] = f'_{index}'
seen.add(name)
...

對於設定了 rename=True 會對不合法的 field_name 重新命名，從程式碼中可以看出重新命名的規則是：如果不合法，判斷是不是 關鍵字 、是不是以 下劃線 開頭，是不是 已經存在 ，如果符合其中一項就會對用 _{當前的 index} 變數重新命名。

...
for name in [typename] + field_names:
if type(name) is not str:
raise TypeError('Type names and field names must be strings')
if not name.isidentifier():
raise ValueError('Type names and field names must be valid '
f'identifiers: {name!r}')
if _iskeyword(name):
raise ValueError('Type names and field names cannot be a '
f'keyword: {name!r}')

seen = set()
for name in field_names:
if name.startswith('_') and not rename:
raise ValueError('Field names cannot start with an underscore: '
f'{name!r}')
if name in seen:
raise ValueError(f'Encountered duplicate field name: {name!r}')
seen.add(name)
...

接下來對輸入的 typename 和 field_names 經檢查了一下引數，仍是使用上面的三個規則，確保 typename 和 field_names 中的元素是合法的字串。

...
field_defaults = {}
if defaults is not None:
defaults = tuple(defaults)
if len(defaults) > len(field_names):
raise TypeError('Got more default values than field names')
field_defaults = dict(reversed(list(zip(reversed(field_names),
reversed(defaults)))))
...

如果設定了 defaults 引數，要最右匹配到 field_names。先使用了 zip 函式，把 reversed 後的 field_names 和 defaults 組合成元組的 list

>>> field_names = ['x', 'y', 'z']
>>> defaults = [2, 3]
>>> 
>>> print(list(zip(reversed(field_names), reversed(defaults))))
[('z', 3), ('y', 2)]
>>>

最後在使用 dict(reversed(...)) 轉化成 dict 型別。

...
# Variables used in the methods and docstrings
field_names = tuple(map(_sys.intern, field_names))
num_fields = len(field_names)
arg_list = repr(field_names).replace("'", "")[1:-1]
repr_fmt = '(' + ', '.join(f'{name}=%r' for name in field_names) + ')'
tuple_new = tuple.__new__
_dict, _tuple, _len, _map, _zip = dict, tuple, len, map, zip

# Create all the named tuple methods to be added to the class namespace

s = f'def __new__(_cls, {arg_list}): return _tuple_new(_cls, ({arg_list}))'
namespace = {'_tuple_new': tuple_new, '__name__': f'namedtuple_{typename}'}
# Note: exec() has the side-effect of interning the field names
exec(s, namespace)
__new__ = namespace['__new__']
__new__.__doc__ = f'Create new instance of {typename}({arg_list})'
if defaults is not None:
__new__.__defaults__ = defaults
...

這部分動態設定引數的過程，重點關注 exec(s, namespace) ，s 是 __new__ 方法的定義，其中的 arg_list 是我們設定的屬性名字會轉換成 x, y, x 這種形式，填充的 s 中。namespace 則是 exec 過程中可使用的變數，這裡傳入了 tuple_new = tuple.__new__ 用於建立一個新的 tuple。

...
@classmethod
def _make(cls, iterable):
result = tuple_new(cls, iterable)
if _len(result) != num_fields:
raise TypeError(f'Expected {num_fields} arguments, got {len(result)}')
return result

_make.__func__.__doc__ = (f'Make a new {typename} object from a sequence '
'or iterable')

def _replace(_self, **kwds):
result = _self._make(_map(kwds.pop, field_names, _self))
if kwds:
raise ValueError(f'Got unexpected field names: {list(kwds)!r}')
return result

_replace.__doc__ = (f'Return a new {typename} object replacing specified '
'fields with new values')

def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + repr_fmt % self

def _asdict(self):
'Return a new dict which maps field names to their values.'
return _dict(_zip(self._fields, self))

def __getnewargs__(self):
'Return self as a plain tuple.Used by copy and pickle.'
return _tuple(self)

# Modify function metadata to help with introspection and debugging
for method in (__new__, _make.__func__, _replace,
__repr__, _asdict, __getnewargs__):
method.__qualname__ = f'{typename}.{method.__name__}'
...

接著定義了一些列的方法，這些方法最後都是用於生成 namedtuple 後所擁有的方法，根據簡單的註釋可以很容易知道他們的用途

...
# Build-up the class namespace dictionary
# and use type() to build the result class
class_namespace = {
'__doc__': f'{typename}({arg_list})',
'__slots__': (),
'_fields': field_names,
'_field_defaults': field_defaults,
# alternate spelling for backward compatiblity
'_fields_defaults': field_defaults,
'__new__': __new__,
'_make': _make,
'_replace': _replace,
'__repr__': __repr__,
'_asdict': _asdict,
'__getnewargs__': __getnewargs__,
}

# _tuplegetter = lambda index, doc: property(_itemgetter(index), doc=doc)
for index, name in enumerate(field_names):
doc = _sys.intern(f'Alias for field number {index}')
class_namespace[name] = _tuplegetter(index, doc)

result = type(typename, (tuple,), class_namespace)
...

定義 class_namespace 傳入上面定義好一系列方法，最後使用 type 創建出一個新的 class。

[!NOTE]

Python 所有的東西都是 type 這個函式創建出來的，包括 type 本身，更多 type 相關資訊參考

https://docs.python.org/3/library/functions.html#type

...
# For pickling to work, the __module__ variable needs to be set to the frame
# where the named tuple is created.Bypass this step in environments where
# sys._getframe is not defined (Jython for example) or sys._getframe is not
# defined for arguments greater than 0 (IronPython), or where the user has
# specified a particular module.
if module is None:
try:
module = _sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
if module is not None:
result.__module__ = module

return result
 ...

最後需要把 module 屬性設定回 result 的 __module__ 中，這些資訊會在 pickle 會被用到。

總結一下，namedtuple 建立過程大體分成三個部分：

__new__
type

其實在不久之前，namedtuple 還是直接使用字串模板生成，現在這種實現方法更優雅了。

Python 標準庫原始碼分析 namedtuple

您可能也會喜歡…