python正則表達式與re模塊

阿新 • • 發佈：2019-01-05

finditer target next http tin 成功正向顯示 tell

python中的re模塊常用函數/方法

0.正則表達式對象　　（re.compile(pattern, flags=0)）

將正則表達式編譯成正則表達式對象，該對象可調用正則表達式對象方法如:re.match(),re.search(),re.findall等。

prog = re.compile(pattern)
result = prog.match(string)
//上下兩種寫法意義相同
result = re.match(pattern, string)

1.匹配對象及方法（Match.group([group1, ...])， Match.groups()，Match.groupdict()

） (?P<name>）

正則表達式對象成功調用match,search方法時返回的對象。主要有兩個方法group()和groups()。(失敗時返回None，而None調用這兩個方法會出現異常)

group()函數通常用於普通方式顯示所有的匹配部分，也可用序號檢索各個匹配子組。

groups()函數用於獲取一個包含所有匹配子字符串的元組。(在只有一個匹配子組時會返回空元組)

ob = re.compile(r‘(\w+)-(\d+)‘)　　#()將正則表達式分成了兩個子組
m = re.match(ob,‘abc-123‘)
m.group()          #完整匹配
‘ 
abc-123‘    
m.group(1)         #匹配子組1
‘abc‘
m.group(2)         #匹配子組2
‘123‘
m.groups()
(‘abc‘, ‘123‘)     #全部子組

(?P<name>)特殊符號可以使用名稱標識符來保存匹配而不是數字。此時使用groupdict()方法返回一個字典，key為所給的名稱標識符，而value為保存的匹配。

ob = re.compile(r‘(?P<first>\w+)-(?P<second>\d+)‘)
m = re.match(ob,‘abc-123‘)
m.groupdict()
{ 
‘second‘: ‘123‘, ‘first‘: ‘abc‘}

2.匹配字符串 (re.match(pattern, string, flags=0)， re.search())

match()方法從字符串的起始部分對模式進行匹配，如果匹配成功，返回一個匹配對象，失敗則返回None。

search()方法從任意位置對正則表達式對象搜索第一次出現的匹配,成功則返回一個匹配對象，失敗返回None。

>>> m = re.search(‘tif‘,‘beautiful‘)        
>>> m.group()       
‘tif‘ 　　　　#匹配成功
>>> m.groups()
()           #返回空元組    
>>> m = re.match(‘tif‘,‘beautiful‘)
>>> m.group()          #返回None,而None沒有group()方法
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    m.group()
AttributeError: ‘NoneType‘ object has no attribute ‘group‘

3.查找每一次出現的位置 (re.findall(pattern, string, flags=0)) re.finditer()

findall()查詢字符串中某個正則表達式模式全部的非重復出現情況。與search()類似，而與之不同的是，findall()方法返回一個列表，如果匹配成功則列表包含所有成功的匹配部分；如果匹配失敗則返回空列表。

finditer()與findall類似（包含所有成功匹配），但它返回一個叠代器。

>>> s = ‘This and that and the‘
>>> re.findall(r‘(th\w+)‘,s,re.I)　　//findall返回列表
[‘This‘, ‘that‘, ‘the‘]
>>> it = re.finditer(r‘(th\w+)‘,s,re.I)　　//返回叠代器，用next()方法
>>> g = next(it)　　　　
>>> g.groups()
(‘This‘,)>>> g = next(it)
>>> g.group(1)
‘that‘
>>> g = next(it)
>>> g.group(1)
‘the‘
>>> [g.group(1) for g in re.finditer(r‘(th\w+)‘,s,re.I)]　　//列表推導式
[‘This‘, ‘that‘, ‘the‘]

4.搜索與替換 (re.sub(pattern, repl, string, count=0, flags=0)) re.subn()

將某字符串中的所有匹配正則表達式的部分進行某種形式的替換。sub()與subn()幾乎一樣，sub()返回值是替換的個數，subn()返回值是元組：(替換後的字符串，替換個數)。

>>> re.sub(‘hello‘,‘HELLO‘,‘hello the hello and world\n‘)　　//將所有hello替換為HELLO
‘HELLO the HELLO and world\n‘　　
>>> re.subn(‘hello‘,‘HELLO‘,‘hello the hello and world\n‘)
(‘HELLO the HELLO and world\n‘, 2)
>>> re.sub(‘hello‘,‘world‘,‘hello the hello and world\n‘,1)　　//替換一個hello,即添加count參數
‘world the hello and world\n‘
>>> re.subn(‘[ed]‘,‘world‘,‘hello the hello and world\n‘)　　//將e或d替換為world，替換了5個
(‘hworldllo thworld hworldllo anworld worlworld\n‘, 5)

5.分隔字符串（re.split(pattern, string, maxsplit=0, flags=0)）　　//類似於字符串的split()用法

6.擴展符號　　（前述方法的flags參數；而括號中為正則表達式的擴展符號,兩種相同作用，用一種即可）

re.I/IGNORECASE （？i）不區分大小寫的匹配

>>> re.findall(r‘(?i)yes‘,‘yes Yes YES!!‘)    //(?i)不區分大小寫，正則表達式層面
[‘yes‘, ‘Yes‘, ‘YES‘]
>>> re.findall(r‘yes‘,‘yes Yes YES!!‘,re.I)　　//re.I不區分大小寫，python語言層面;下同
[‘yes‘, ‘Yes‘, ‘YES‘]

re.M/MULTILINE （？m）實現跨行搜索

>>> re.findall(r‘(?im)(^th[\w]+)‘,"""
This line is the first
another line
that line is the end""")
[‘This‘, ‘that‘]

re.S/DOTALL　　(?s) 使 . 符號能表示\n符號

re.X/VERBOSE （？x）通過抑制在正則表達式中使用空白符來創建更易讀的正則表達式

>>> re.search(r‘‘‘(?x)
\((\d{3})\)　　//區號
[ ]　　//空格
(\d{3})　　//前綴
-　　//橫線
(\d{4})　　//末尾數字
‘‘‘,‘(800) 555-1212‘).groups()
(‘800‘, ‘555‘, ‘1212‘)

(?:...)可以對正則表達式分組，但不保存該分組用於後續檢索或應用。

>>> re.findall(r‘(?:\w+\.)*(\w+\.com)‘,‘baidu.com www.baidu.com code.baidu.com‘)　　//不保存（\w+\.）*匹配的分組，因而www,code均不出現在結果中
[‘baidu.com‘, ‘baidu.com‘, ‘baidu.com‘]

(?=...)和(?!...)可以實現前視匹配。前者正向前視斷言，後者負向前視斷言。通俗來說：(?=...)僅僅獲取...表達式前的字符串，忽略該表達式；(?!...)則獲取後面的字符串。

import re
result = re.findall(r‘\w+(?= van Rossum)‘,
"""
    guido van Rossum
    tim peter
    Alex Martelli
    Just van Rossum
    Raymond Hettinger
""")
print(result)

[‘guido‘, ‘Just‘]    //結果，忽略van Rossum而只保存該字符串前面的部分

正則表達式對象的另一種調用方法

Pattern.match(string[, pos[, endpos]])

Pattern.search(string[,pos[,endpos]])

Pattern.findall(string[, pos[, endpos]])

Pattern.finditer(string[, pos[, endpos]])

區別在於可調整pos，endpos參數來調整匹配範圍。

import re
ob = re.compile(‘llo‘)
m1 = ob.match(‘hello world‘)
m2 = ob.match(‘hello world‘, 2)
print(m1, m2.group())
None llo            //match從頭匹配，m1為空；從第三個開始匹配，則m2匹配成功

對正則表達式特殊符號無了解可訪問：正則表達式常用字符及符號

python正則表達式與re模塊

finditer target next http tin 成功正向顯示 tell python中的re模塊常用函數/方法 0.正則表達式對象　　（re.compile(pattern, flags=0)）將正則表達式編譯成正則表達式對象，該對象可調用正則表達式對象

python正則表達式與re模塊

python正則表達式與re模塊

Python 正則表達式、re模塊

python 正則表達式 (重點) re模塊

python正則表達式之re模塊使用

Python中正則表達式（re模塊）的使用

Python基礎----正則表達式和re模塊

python中的正則表達式（re模塊）

python中的正則表達式（re模塊）三

python基礎之正則表達式，re模塊

正則表達式&re模塊

正則表達式和re模塊

二十一、正則表達式（re模塊）

內置函數，匿名函數，正則表達式，常用模塊

Python 正則表達式詳解與 re 模塊的使用

Python正則表達式模塊re

python 正則表達式(RE)筆記1

超詳細Python正則表達式操作指南(re使用)，一

python 正則表達式re庫相關筆記

Python(八) 正則表達式與JSON

文件操作，路徑操作，StringIO和BytesIO，序列化反序列化，正則表達式與python中使用

python正則表達式與re模塊

相關推薦