1. 程式人生 > >Python中fnmatch模組的使用

Python中fnmatch模組的使用

fnmatch()函式匹配能力介於簡單的字串方法和強大的正則表示式之間,如果在資料處理操作中只需要簡單的萬用字元就能完成的時候,這通常是一個比較合理的方案。此模組的主要作用是檔名稱的匹配,並且匹配的模式使用的Unix shell風格。原始碼很簡單:

"""Filename matching with shell patterns.

fnmatch(FILENAME, PATTERN) matches according to the local convention.
fnmatchcase(FILENAME, PATTERN) always takes case in account.

The functions operate by translating the pattern into a regular
expression.  They cache the compiled regular expressions for speed.

The function translate(PATTERN) returns a regular expression
corresponding to PATTERN.  (It does not compile it.)
"""
import os
import posixpath
import re
import functools __all__ = ["filter", "fnmatch", "fnmatchcase", "translate"] def fnmatch(name, pat): """Test whether FILENAME matches PATTERN. Patterns are Unix shell style: * matches everything ? matches any single character [seq] matches any character in seq [!seq] matches any char not in seq An initial period in FILENAME is not special. Both FILENAME and PATTERN are first case-normalized if the operating system requires it. If you don't want this, use fnmatchcase(FILENAME, PATTERN). """ name = os.path.normcase(name) pat = os.path.normcase(pat) return fnmatchcase(name, pat) @functools.lru_cache(maxsize=256, typed=True) def _compile_pattern(pat): if isinstance(pat, bytes): pat_str = str(pat, 'ISO-8859-1') res_str = translate(pat_str) res = bytes(res_str, 'ISO-8859-1') else: res = translate(pat) return re.compile(res).match def filter(names, pat): """Return the subset of the list NAMES that match PAT.""" result = [] pat = os.path.normcase(pat) match = _compile_pattern(pat) if os.path is posixpath: # normcase on posix is NOP. Optimize it away from the loop. for name in names: if match(name): result.append(name) else: for name in names: if match(os.path.normcase(name)): result.append(name) return result def fnmatchcase(name, pat): """Test whether FILENAME matches PATTERN, including case. This is a version of fnmatch() which doesn't case-normalize its arguments. """ match = _compile_pattern(pat) return match(name) is not None def translate(pat): """Translate a shell PATTERN to a regular expression. There is no way to quote meta-characters. """ i, n = 0, len(pat) res = '' while i < n: c = pat[i] i = i+1 if c == '*': res = res + '.*' elif c == '?': res = res + '.' elif c == '[': j = i if j < n and pat[j] == '!': j = j+1 if j < n and pat[j] == ']': j = j+1 while j < n and pat[j] != ']': j = j+1 if j >= n: res = res + '\\[' else: stuff = pat[i:j].replace('\\','\\\\') i = j+1 if stuff[0] == '!': stuff = '^' + stuff[1:] elif stuff[0] == '^': stuff = '\\' + stuff res = '%s[%s]' % (res, stuff) else: res = res + re.escape(c) return r'(?s:%s)\Z' % res 

fnmatch的中的5個函式["filter", "fnmatch", "fnmatchcase", "translate"]

  • filter 返回列表形式的結果
def gen_find(filepat, top): """ 查詢符合Shell正則匹配的目錄樹下的所有檔名 :param filepat: shell正則 :param top: 目錄路徑 :return: 檔案絕對路徑生成器 """ for path, _, filenames in os.walk(top): for file in fnmatch.filter(filenames, filepat): yield os.path.join(path, file) 
  • fnmatch
電動叉車

# 列出元組中所有的python檔案 pyfiles = [py for py in ('restart.py', 'index.php', 'file.txt') if fnmatch(py, '*.py')] # 字串的 startswith() 和 endswith() 方法對於過濾一個目錄的內容也是很有用的
  • fnmatchcase 區分大小寫的檔案匹配
# 這兩個函式通常會被忽略的一個特性是在處理非檔名的字串時候它們也是很有用的。 比如,假設你有一個街道地址的列表資料
address = [
    '5412 N CLARK ST', '1060 W ADDISON ST', '1039 W GRANVILLE AVE', '2122 N CLARK ST', '4802 N BROADWAY', ] print([addr for addr in address if fnmatchcase(addr, '* ST')]) 
  • translate 這個似乎很少有人用到,前面說了fnmatch是Unix shell匹配風格,可以使用translate將其轉換為正則表示式,舉個栗子
shell_match = 'Celery_?*.py'
print(translate(shell_match)) # 輸出結果:(?s:Celery_..*\.py)\Z 

Celery_..*\.py就是正則表示式的寫法。