Python中的正則表達式-re模塊

阿新 • • 發佈：2017-07-15

最大的語法詳細 ict over emp 則表達式 regular mpi

有時候我們需要模糊查找我們需要的字符串等值，這個時候需要用到正則表達式。正則表達式的使用，在python中需要引入re包

import re

1、首先了解下正則表達式的常用語法

——單個字符

.	任意的一個字符
a\|b	字符a或字符b
[afg]	a或者f或者g的一個字符
[0-4]	0-4範圍內的一個字符
[a-f]	a-f範圍內的一個字符
[^a]	不是a的一個字符
\s	一個空格
\S	一個非空格
\d	[0-9]，即0-9的任意字符
\D	[^0-9]，即非0-9的任意字符
\w	[0-9a-zA-Z]
\W	[^0-9a-zA-Z]
\b	匹配一個單詞邊界，也就是指單詞和空格間的位置。例如，“er\b”可以匹配“never”中的“er”，但不能匹配“verb”中的“er”
\B	匹配非單詞邊界。“er\B”能匹配“verb”中的“er”，但不能匹配“never”中的“er”

——重復

*	重復>=0次
+	重復>=1次
？	重復0次或是1次
{m}	重復m次，如[01]{2}匹配字符串00或11或01或10
{m,n}	重復m-n次，如a{1,3}匹配字符串a或aa或aaa

——位置

^	字符串的起始位置
$	字符串的結尾位置

——返回控制

對搜索的結果進行進一步精簡信息，可以使用小括號擴住對應的正則表達式。如

m = re.search("output_(\d{4}).*(\d{4})", "output_1986a.txt1233")

其中字符串匹配兩個(\d{4})，最後可以輸出1986和1233兩個。分別為m.group(1)和m.group(2) search()方法是在整個字符串中找，下面匹配了兩組字符串，即兩個小括號裏面的內容，所以如果寫match.group(3)就是報錯，不存在該組。如果給分組添加別名的話，就可以用到groupdict()，使用方法如下

>>> match = re.search(r‘(?P<first>\bt\w+)\W+(?P<second>\w+)‘, ‘This is test for python group‘)     
>>> print match
<_sre.SRE_Match object at 0x23f6250>
>>> print match.group()
test for
>>> print match.group(0)
test for
>>> print match.group(1)
test
>>> print match.group(2)
for
>>> print match.groupdict()     
{‘second‘: ‘for‘, ‘first‘: ‘test‘}
>>> print match.groupdict()[‘first‘]
test
>>> print match.groupdict()[‘second‘]
for

2、re中常用的方法

python通過re模塊提供對正則表達式的支持，使用re模塊一般是先將正則表達式的字符串形式編譯成Pattern對象，然後用Pattern對象來處理文本得到一個匹配的結果，也就是一個Match對象，最後通過Match得到我們的信息並進行操作

1）compile方法

>>> help(re.compile) Help on function compile in module re: compile(pattern, flags=0) Compile a regular expression pattern, returning a pattern object.

上面可以看到compile返回一個pattern對象。其中第二個參數flags是匹配模式，可以使用按位或“|”表示同時生效，也可以在正則表達式字符串中指定。pattern對象是不能直接實例化的，只能通過compile方法得到。匹配模式：

1).re.I(re.IGNORECASE): 忽略大小寫 2).re.M(MULTILINE): 多行模式，改變‘^‘和‘$‘的行為 3).re.S(DOTALL): 點任意匹配模式，改變‘.‘的行為 4).re.L(LOCALE): 使預定字符類 \w \W \b \B \s \S 取決於當前區域設定 5).re.U(UNICODE): 使預定字符類 \w \W \b \B \s \S \d \D 取決於unicode定義的字符屬性 6).re.X(VERBOSE): 詳細模式。這個模式下正則表達式可以是多行，忽略空白字符，並可以加入註釋

如下代碼：

import re

pattern = re.compile(r‘re‘)
pattern.match(‘This is re module of python‘)
re.compile(r‘re‘, ‘This is re module of python‘)
# 以上兩種方式是一樣的
# 以下兩種方式是一樣的
pattern1 = re.compile(r"""\d + #整數部分
                          \.   #小數點
                          \d * #小數部分""", re.X)
pattern2 = re.compile(r‘\d+\.\d*‘)

2）match方法

>>> help(re.match) Help on function match in module re: match(pattern, string, flags=0) Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.

match方法是對字符串的開頭進行匹配。如果匹配到則返回一個match對象；如果匹配失敗，則返回None。這個flags是編譯pattern時指定的模式。group是Match對象的方法，表示得到的某個組的匹配。如果使用分組來查找字符串的各個部分，可以通過group得到每個組匹配到的字符串。

>>> match = re.match(r‘This‘, ‘This is re module of python‘) >>> print match <_sre.SRE_Match object at 0x0000000002C26168> >>> print match.group() This >>> match = re.match(r‘python‘, ‘This is re module of python‘) >>> print match None

3）search方法

>>> help(re.search) Help on function search in module re: search(pattern, string, flags=0) Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.

search()方法是在整個字符串中找，而match只是在字符串的開頭找，上面匹配了兩組字符串，即兩個小括號裏面的內容，所以如果寫match.group(3)就是報錯，不存在該組。如果給分組添加別名的話，就可以用到groupdict()，使用方法如下

>>> match = re.search(r‘(?P<first>\bt\w+)\W+(?P<second>\w+)‘, ‘This is test for python group‘)     
>>> print match
<_sre.SRE_Match object at 0x23f6250>
>>> print match.group()
test for
>>> print match.group(0)
test for
>>> print match.group(1)
test
>>> print match.group(2)
for
>>> print match.groupdict()     
{‘second‘: ‘for‘, ‘first‘: ‘test‘}
>>> print match.groupdict()[‘first‘]
test
>>> print match.groupdict()[‘second‘]
for

4）split方法

>>> help(re.split) Help on function split in module re: split(pattern, string, maxsplit=0, flags=0) Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.

按匹配到的字符串來分隔給定的字符串，然後返回一個列表，maxsplit參數為最大的分隔次數。

>>> results = re.split(r‘\d+‘, ‘fasdf12fasdf4fasf1fasdf123‘) >>> type(results) <type ‘list‘> >>> print results [‘fasdf‘, ‘fasdf‘, ‘fasf‘, ‘fasdf‘, ‘‘] >>> results = re.split(r‘-‘, ‘2013-11-12‘) >>> print results [‘2013‘, ‘11‘, ‘12‘]

5）findall方法

>>> help(re.findall) Help on function findall in module re: findall(pattern, string, flags=0) Return a list of all non-overlapping matches in the string. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

findall方法返回一個列表，裏面方的是所有匹配到的字符串。如果我們的正則表達式沒有給他們分組，那麽就是匹配到的字符串；如果進行了分組，那麽就是以元組的方式放在列表中

>>> results = re.findall(r‘\bt\w+\W+\w+‘, ‘this is test for python findall‘) 
>>> results
[‘this is‘, ‘test for‘]
>>> results = re.findall(r‘(\bt\w+)\W+(\w+)‘, ‘this is test for python findall‘)
>>> results
[(‘this‘, ‘is‘), (‘test‘, ‘for‘)]

6）sub和subn方法

sub(pattern, repl, string, count=0)

subn(pattern, repl, string, count=0)

sub方法：先通過正則表達式匹配string中的字符串，匹配到了再用repl來替換，count表示要替換的次數，不傳參表示全部替換，返回的是替換過後的字符串。repl可以是一個字符串，也可以是一個方法，是方法的時候，必須有一個參數就是Match對象，必須返回一個用於替換的字符串。通過上面的代碼可以看出，這個Match對象就是匹配到的Match對象，還記得match和search方法的返回值吧。如果要對匹配到的字符串做改變，用第二種方式會清晰一點

>>> print re.sub(r‘(\w+) (\w+)‘, r‘\2 \1‘, ‘i say, hello world!‘)
say i, world hello!

subn方法和sub方法基本上是一樣的，只是sub返回的是替換後的字符串，而subn返回的是一個元組，這個元組有兩個元素，第一個是替換過後的字符串，第二個是number，也就是替換的次數，如果我們後面指定替換的次數後，那麽這個number就和我們指定的count一樣

>>> print re.subn(r‘(\w+) (\w+)‘, r‘\2 \1‘, ‘i say, hello world!‘) (‘say i, world hello!‘, 2) >>> print re.subn(r‘(\w+) (\w+)‘, r‘\2 \1‘, ‘i say, hello world!‘, count=1) (‘say i, hello world!‘, 1)

Python中的正則表達式-re模塊

Python中的正則表達式-re模塊

1、首先了解下正則表達式的常用語法

——單個字符

——重復

——位置

——返回控制

2、re中常用的方法

1）compile方法

2）match方法

3）search方法

4）split方法

5）findall方法

6）sub和subn方法

python—day17 正則表達式 re模塊

Python：正則表達式 re 模塊

Python中的正則表達式-re模塊

python中的正則表達式----re模塊

Python基礎知識之正則表達式re模塊

python之路---24 正則表達式 re模塊

爬蟲——正則表達式re模塊

正則表達式re模塊

非結構化數據與結構化數據提取---正則表達式re模塊

Python中正則表達式（re模塊）的使用

python 中正則表達式的使用

python中正則表達式的一些問題

python中正則表達式與模式匹配

python學習-正則表示式及re模塊

日期區間正則表達式生成模塊

Python基礎----正則表達式和re模塊

Python基礎----正則表達式爬蟲應用，configparser模塊和subprocess模塊

python 正則表達式中反斜杠()的麻煩和陷阱（轉）

python 正則表達式(RE)筆記1

python 正則表達式re庫相關筆記

Python中的正則表達式-re模塊

1、首先了解下正則表達式的常用語法

——單個字符

——重復

——位置

——返回控制

2、re中常用的方法

1）compile方法

2）match方法

3）search方法

4）split方法

5）findall方法

6）sub和subn方法

相關推薦