Python 正則表達式

阿新 • • 發佈：2017-11-12

返回 art -i study 單個配對 kkk sub() ear

1.正則表達式基本概念

背景

我們要匹配以xxx開頭的字符串、xxx結尾的字符串等時，每一次匹配都要單獨寫一個函數或語句來完成，正則表達式就是將匹配的方法抽象成一個規則，然後使用這個規則來進行文本或數據的匹配。

概念

是使用單個字符串來描述匹配一系列符合某個語法規則的字符串

是對字符串操作的一種邏輯公式

應用場景

處理文本或數據

處理過程

依次拿出表達式和目標數據進行字符比較，如果每一個字符都能匹配，則匹配成功，否則，失敗。

2.Python正則表達式之re模塊

字符串自帶的查找方法

str1.find(str2)

str1.startswith(str2)

str2.endswith(str2)

詳見：python基礎--02 Python內置基本類型中的1.4節

re模塊使用

導入re模塊

import re

生成pattern實例

pa=re.pattern(正則表達式, flag)

參數

正則表達式

最好是raw字符串；

如果正則表達式首位帶括號，則最終的match實例.groups()方法可以以元組的形式展示匹配到的字符串，如r‘(study)‘，但是元組中始終只有一個元素。

flag

re.A | re.ASCII

對\w、\W、\b、\B、\d、\D、\s和\S產生影響，編譯後的模式對象在進行匹配的時候，只會匹配ASCII字符，而不是Unicode字符。

re.I | re.IGNORECASE

在匹配的時候忽略大小寫

re.M | re.MULTILINE

默認，元字符^會匹配字符串的開始處，元字符$會匹配字符串的結束位置和字符串後面緊跟的換行符之前（如果存在這個換行符）。

如果指定了這個選項，則^將會匹配字符串的開頭和每一行的開始處，緊跟在每一個換行符後面的位置。

類似的，$會匹配字符串的最後和每一行的最後，在接下來的換行符的前面的位置。

>>> p = re.compile(r‘(^hello$)\s(^hello$)\s(^hello$)\s‘)
>>> m = p.search(‘hello\nhello\nhello\n‘)
>>> print(m)

None

>>> p = re.compile(r‘(^hello$)\s(^hello$)\s(^hello$)\s‘, re.M)
>>> m = p.search(‘\nhello\nhello\nhello\n‘)
>>> m.groups()

(‘hello‘, ‘hello‘, ‘hello‘)

re.S | re.DOTALL

使得.元字符可以匹配任何字符，包括換行符。

re.X | re.VERBOSE

這個選項允許編寫可讀性更強的正則表達式代碼，並進行自由的格式化操作。

當這個選項被指定以後，在正則表達式之間的空格符會被忽略，除非這個空格符是在一個字符類中[ ]，或者在空格前使用一個反斜杠\。

這個選項允許對正則表達式進行縮進，使得正則表達式的代碼更加格式化，更加清晰。並且可以在正則表達式的代碼中使用註釋，這些註釋會被正則表達式引擎在處理的時候忽略。

註釋以‘#‘字符開頭。所以如果需要在正則表達式中使用‘#‘符號，需要在前面添加反斜杠‘\#‘或者將它放在[]中,[#]。

charref = re.compile(r"""
&[#]                # Start of a numeric entity reference
(
     0[0-7]+         # Octal form
   | [0-9]+          # Decimal form
   | x[0-9a-fA-F]+   # Hexadecimal form
)
;                   # Trailing semicolon
""", re.VERBOSE)

如果沒有指定re.**VERBOSE**選項，則相當於：

    charref = re.compile("&#(0[0-7]+"
             "|[0-9]+"
             "|x[0-9a-fA-F]+);")

使用pattern實例來進行匹配

match() 從字符串指定位置開始匹配，匹配到就停止，返回match對象

match(string[, pos[, endpos]]) --> match object or None.

Matches zero or more characters at the beginning of the string

search() 從字符串指定位置之後的任意位置開始匹配，匹配到就停止，返回Match對象

search(string[, pos[, endpos]]) --> match object or None.

Scan through string looking for a match, and return a corresponding match object instance. Return None if no position in the string matches.

findall() 從字符串指定位置之後的任意位置開始匹配，匹配到了繼續匹配，返回字符串中所有匹配的字符串組成的列表。

註意：如果正則表達式中有()分組，則findall返回的是被()括起來的分組字符串所組成的列表。

findall(string[, pos[, endpos]]) --> list.

Return a list of all non-overlapping matches of pattern in string.

finditer() 從字符串指定位置之後的任意位置開始匹配，匹配到了繼續匹配，返回一個包含了所有的Match對象的叠代器

finditer(string[, pos[, endpos]]) --> iterator.

Return an iterator over all non-overlapping matches for the RE pattern in string. For each match, the iterator returns a match object.

sub() 將字符串通過正則表達式匹配到的字符使用repl進行制定次數的替換（默認全部替換），repl可以是字符串，也可以使方法名。

當為方法名時，repl方法接收匹配到的match對象，且該sub()方法返回repl方法的返回值

sub(repl, string[, count = 0]) --> newstring

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.

split() 將字符串通過正則表達式匹配到的字符進行指定次數的分割（默認全部），返回分割後的列表

split(string[, maxsplit = 0]) --> list.

Split string by the occurrences of pattern.

# 導入re模塊

>>> import re

# 生成pattern對象

>>> pa=re.compile(r‘(ddd)‘)

# 使用pattern對象通過match方法進行匹配，得到match對象

>>> ma=pa.match(‘dddsssdddsssddd\ndddsssdddsssddd‘,5)

>>> ma.groups()

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

AttributeError: ‘NoneType‘ object has no attribute ‘groups‘

# 使用pattern對象通過search方法進行匹配，得到match對象

>>> ma=pa.search(‘dddsssdddsssddd\ndddsssdddsssddd‘,5)

>>> ma.groups()

(‘ddd‘,)

# 使用pattern對象通過findall方法進行匹配，得到匹配到的字符串所組成的列表

>>> ma=pa.findall(‘dddsssdddsssddd\ndddsssdddsssddd‘,5) 

>>> ma [‘ddd‘, ‘ddd‘, ‘ddd‘, ‘ddd‘, ‘ddd‘]

# 使用pattern對象通過finditer方法進行匹配，得到匹配到的Match對象所組成的叠代器

>>> for i in pa.finditer(‘dddsssdddsssddd\ndddsssdddsssddd‘,5): 

... 　　print i 

... 

<_sre.SRE_Match object at 0x0000000002607378> 

<_sre.SRE_Match object at 0x0000000002544F30> 

<_sre.SRE_Match object at 0x0000000002607378> 

<_sre.SRE_Match object at 0x0000000002544F30> 

<_sre.SRE_Match object at 0x0000000002607378>

# 使用pattern對象通過sub方法進行替換，得到替換後的新字符串

>>> ma=pa.sub(‘aaa‘,‘dddsssdddsssddddddsssdddsssddd‘) 

>>> print type(ma),ma 

<type ‘str‘> aaasssaaasssaaaaaasssaaasssaaa 

>>> ma=pa.sub(‘aaa‘,‘dddsssdddsssddddddsssdddsssddd‘,2) 

>>> print type(ma),ma 

<type ‘str‘> aaasssaaasssddddddsssdddsssddd 

>>> def upper_str(match): 

... 　　return match.group().upper() 

... 

>>> ma=pa.sub(upper_str,‘dddsssdddsssddddddsssdddsssddd‘,2) 

>>> print type(ma),ma 

<type ‘str‘> DDDsssDDDsssddddddsssdddsssddd

# 使用pattern對象通過sub方法進行分割，得到分割後的字符串組成的列表

>>> ma=pa.split(‘dddsssdddsssddddddsssdddsssddd‘,2) 

>>> print type(ma),ma 

<type ‘list‘> [‘‘, ‘sss‘, ‘sssddddddsssdddsssddd‘] 

>>> ma=pa.split(‘dddsssdddsssddddddsssdddsssddd‘) 

>>> print type(ma),ma 

<type ‘list‘> [‘‘, ‘sss‘, ‘sss‘, ‘‘, ‘sss‘, ‘sss‘, ‘‘]

匹配對象屬性

group() 返回正則表達式匹配到的字符串

groups() 返回正則表達式匹配到的字符串構成的元組。註意：如果正則表達式中有()分組，則groups()中是被()括起來的分組字符串所組成的列表。

>>> ma = re.match(r‘[\w]{6,11}@(163|qq|huawei)(163|qq|huawei)\.com\1\2‘,‘[email protected]‘)

>>> ma.group()

‘[email protected]‘

>>> ma.groups()

(‘163‘, ‘qq‘)

start() 返回匹配的起始位置

end() 返回匹配的結束位置

span() 返回一個包含匹配的起始位置和結束位置的元組(start, end)

string 進行匹配的源字符串

ma.re 進行匹配的正則表達式

3.正則表達式基本語法

匹配單個字符

. 匹配任意字符，除了\n

[...] 匹配字符集。如[a-z]

\d|\D 匹配數字|非數字

\s|\S 匹配空白|非空白

\w|\W 匹配word字符|非word字符。[a-zA-Z0-9]

\[\] 匹配字符串中的[]

匹配多個字符

* 匹配前一個字符0次或無限次

+ 匹配前一個字符1次或無限次。如匹配有效標識符r‘[_a-zA-Z]+\w‘

? 匹配前一個字符0次或1次。如匹配兩位數r‘[1-9]?[0-9]‘。註：09的匹配結果是0

{m}|{m,n} 匹配前一個字符m次或者m到n次。如匹配qq郵箱r‘\w{6,10}@qq.com‘

*? |+? |?? *、+、?的匹配模式變為非貪婪模式。即返回的匹配結果會是最少的。

>>> re.findall(r‘[0-9]k*‘,‘1kkkk‘)

[‘1kkkk‘]

>>> re.findall(r‘[0-9]k*?‘,‘1kkkk‘)

[‘1‘]

>>> re.findall(r‘[0-9]k?‘,‘1kkkk‘)

[‘1k‘]

>>> re.findall(r‘[0-9]k??‘,‘1kkkk‘)

[‘1‘]

>>> re.findall(r‘[0-9]k+‘,‘1kkkk‘)

[‘1kkkk‘]

>>> re.findall(r‘[0-9]k+?‘,‘1kkkk‘)

[‘1k‘]

邊界匹配

^ 匹配字符串開頭

$ 匹配字符串結尾

\A|\Z 指定的字符串必須為開頭|結尾

>>> re.findall(r‘\A[0-9].*k\Z‘,‘1kkkk‘)

[‘1kkkk‘]

>>> re.findall(r‘\A[0-9].*k\Z‘,‘1kkkz‘)

[]

分組匹配

| 匹配左右任意一個表達式。如匹配0~100：r‘^[0-9]$|^[1-9][0-9]$|^100$‘

>>> re.findall(r‘^[0-9]$|^[1-9][0-9]$|^100$‘,‘100‘)

[‘100‘]

>>> re.findall(r‘^[0-9]$|^[1-9][0-9]$|^100$‘,‘9‘)

[‘9‘]

>>> re.findall(r‘^[0-9]$|^[1-9][0-9]$|^100$‘,‘99‘)

[‘99‘]

>>> re.findall(r‘^[0-9]$|^[1-9][0-9]$|^100$‘,‘09‘)

[] 單字符集。

(ab) 括號中的表達式作為一個分組。

從左到右按順序為1，2，3。常用於不同的個別單詞。如匹配163郵箱和qq郵箱：r‘\w{6,11}@(163|qq|huawei).com‘

>>> re.match(r‘[\w]{6,11}@(163|qq|huawei)\.com‘,‘[email protected]‘).group()

‘[email protected]‘

>>> re.match(r‘[\w]{6,11}@(163|qq|huawei)\.com‘,‘[email protected]‘).group()

‘[email protected]‘

>>> re.match(r‘[\w]{6,11}@(163|qq|huawei)\.com‘,‘[email protected]‘).group()

‘[email protected]‘

\<number> 引用編號為num的分組匹配到的字符串。類似於管道命令中的xargs -i。

註：\1對應第一個()所匹配到的字符串。如果只有1個分組()，但是使用\2，則會報錯。如用來匹配XML文件

>>> re.match(r‘<(\w+>).*</\1‘,‘<book>test</book>‘).group()

‘<book>test</book>‘

>>> re.match(r‘<(\w+>).*</\1‘,‘<book>test</ebook>‘).group()

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

AttributeError: ‘NoneType‘ object has no attribute ‘group‘

>>> re.match(r‘<(\w+>).*</\1‘,‘<book>test</book1>‘).group()

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

AttributeError: ‘NoneType‘ object has no attribute ‘group‘

(?P<name>) 給分組起別名

(?P=name) 引用起過別名的分組

>>> re.match(r‘[\w]{6,11}@(?P<type1>163|qq|huawei)(?P<type2>163|qq|huawei)\.com(?P=type1)(?P=type2)‘,‘[email protected]‘).group()

‘[email protected]‘

>>> re.match(r‘[\w]{6,11}@(?P<type1>163|qq|huawei)(?P<type2>163|qq|huawei)\.com(?P=type1)(?P=type2)‘,‘[email protected]‘).group()

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

AttributeError: ‘NoneType‘ object has no attribute ‘group‘

Python 正則表達式

Python正則表達式之findall疑點

div 表達 clas 例子表達式如何實現如何正則表達式正則表達在findall中使用()進行分組時，得出的結果會優先提取分組的，比如下面這個例子 1 In [46]: re.findall(r"www.(baidu|163).com", "www.baidu

Python-正則表達式1

findall 模型一個 re模塊相關結束 eight 第一個 () Python中的re模塊提供了正則表達式相關操作。字符：　　. 匹配除換行符以外的任意字符　　\w 匹配字母或數字或下劃線或漢字　　\s 匹配任意的空白符　　\d 匹配數字　　\b 匹配單詞

python-正則表達式

模式不改變字符串 ont aid 返回集合匹配字符串內容使用正則表達式時，需要導入包，import re ,簡單使用如下：匹配字符串的幾個方法 match :從第一個單詞開始匹配，若匹配成功，則返回一個對象；若沒有匹配數據，則返回None import re

python正則表達式

表達式 {} spa 執行 color dal 表達 pri 則表達式元字符 . ^ $ * + ? {} [] | () \ . 匹配除換號符以為的任意一個符號 ret=re.findall(‘李.‘,‘李傑，李剛，王超，占山，李蓮英‘) print(ret) 執行結

Day 26 python 正則表達式

int print highlight -a 斜杠數量反斜杠 find 空格 re模塊\正則表達式一、元字符 1、. ^ $ * + ? { } [ ] | ( ) \ "."　　代表（任意一個字符） "*"　　代表（任意數量任意字符，0-無窮） "+"

Python正則表達式（一）

成功 fin 全部 dal 出現元組叠代器所有函數 match(pattern,string,flag=0) 匹配成功就返回匹配對象，匹配失敗就返回None。 search(pattern,string,flag=0) 在字符串中搜索第一次出現的正則表達式

Python正則表達式小結(1)

img src ack ont vba 第一個 xxxxx rgb family 學習一段python正則表達式了，對match、search、findall、finditer等函數作一小結以下以一段網頁為例，用python正則表達式作一個範例：

Python正則表達式（二）

發生 sub pre 則表達式正則表達式 str1 blog 回發 clas sub()和subn() sub(pattern,repl,string,count=0) 用於實現搜索和替換功能，使用repl替換所有正則表達式的模式在字符串中出現的位置，除非定義co

[ Python ] 正則表達式（1）

solid lnp 額外 spl 字符正則 -s bject 正則表達 [ Python ] 正則表達式（1）概念區分：搜索 ( Search ) 和匹配 ( Match ) from re import search, match search("nana"

Python正則表達式------進階

指定表達得到表示關閉標點符號 one 下劃線小寫 Python正則表達式正則表達式是一個特殊的字符序列，它能幫助你方便的檢查一個字符串是否與某種模式匹配。 Python 自1.5版本起增加了re 模塊，它提供 Perl 風格的正則表達式

Python正則表達式模塊re

身份證號碼 cas 返回 eight pri 經典增加為什麽有用介紹正則表達式是用來簡介表達一組字符串的表達式，是一種通用的字符串表達框架。正則表達式是一種正對字符串表達“簡潔”和“特征”思想的工具，正則表達式

Python -- 正則表達式

可選 print 完整一次分享則表達式參數 col 掃描 Python -- 正則表達式正則表達式是一個特殊的字符序列，它能幫助你方便的檢查一個字符串是否與某種模式匹配。 Python 自1.5版本起增加了re 模塊，它提供 Perl 風格的正則表達式模式。

python正則表達式匹配十六進制數據

fin phy decimal 進制 ref check 十六 http ffi 1. Find any hexadecimal number in a larger body of text \b[0-9a-fA-F]+\b 2. Check whether a

【轉】【Python】Python正則表達式使用指導

poi 相關信息 repl 模塊 compile 直接 live 單詞 d+ 1. 正則表達式基礎 1.1. 簡單介紹正則表達式並不是Python的一部分。正則表達式是用於處理字符串的強大工具，擁有自己獨特的語法以及一個獨立的處理引擎，效率上可能不如str自帶的方法，但功

Python 正則表達式

返回 art -i study 單個配對 kkk sub() ear 1.正則表達式基本概念背景我們要匹配以xxx開頭的字符串、xxx結尾的字符串等時，每一次匹配都要單獨寫一個函數或語句來完成，正則表達式就是將匹配的方法抽象成一個規則，然後使用這個規則來進行文本或數據

Python 正則表達式提高

indexer python 正則解決 inf turn 操作符模式精確匹配 hang re模塊的高級用法 search re.search(pattern, string[, flags]) ? 若string中包含pattern子串，則返回Match對象，否則

Python正則表達式返回首次匹配到的字符及查詢的健壯性

ror exe https -m rec last first sta clas re.findall(pattern,string)會搜索所有匹配的字符，返回的是一個列表，獲取首個匹配需要re.findall(pattern,string)[0]訪問, 但是如果finda

Python: 正則表達式匹配反斜杠 ""

details 字符串 art tails spa .net python 正在 12px Python正則表達式匹配反斜杠 "\" eg: >>>a=‘w\w\w‘ ‘w\\w\\w‘　　# 打印出來的 "\\" 被轉義成一個反斜杠 "\" 如果需要

Python 正則表達式 (python網絡爬蟲)

寫上 win works 網絡爬蟲特殊 ner 寫博客 import 計算機程序　　昨天 2018 年 01 月 31 日，農歷臘月十五日。20:00 左右，152 年一遇的月全食、血月、藍月將今晚呈現空中，雖然沒有看到藍月亮，血月、月全食也是勉強可以了，還是可以想像一

python-正則表達式練習題一

標識 sin woe mat python2 完整地址開頭自己 1、匹配一行文字中的所有開頭的字母內容 #coding=utf-8 import re s="i love you not because of who you are, but because

Python 正則表達式

1.正則表達式基本概念

背景

概念

應用場景

處理過程

2.Python正則表達式之re模塊

字符串自帶的查找方法

re模塊使用

導入re模塊

生成pattern實例

使用pattern實例來進行匹配

匹配對象屬性

3.正則表達式基本語法

匹配單個字符

匹配多個字符

邊界匹配

分組匹配

相關推薦