python3.5學習筆記（第六章）

阿新 • • 發佈：2018-07-05

原本空白制表符 false sdh office 元組所有 n)

本章內容：

　　正則表達式詳解（re模塊）

1、不使用正則表達式來查找文本的內容

　　要求從一個字符串中查找電話號碼，並判斷是否匹配制定的模式，如：555-555-5555。傳統的查找方法如下：

def isPhoneNumber(text):
    if len(text) != 12:
        return False
    for i in range(0,3):
        if not text[i].isdecimal():
            return False
    if text[3] !=‘-‘:
        return False
     
for i in range(4,7):
        if not text[i].isdecimal():
            return False
    if text[7] !=‘-‘:
        return False
    for i in range(8,12):
        if not text[i].isdecimal():
            return False
    return True

message = "Call me at 415-232-2354 tomorrow. 415-234-2545 is my office. 
"
for i in range(len(message)):
    chunk = message[i:i+12]
    if isPhoneNumber(chunk):
        print(‘Phone number found:‘+chunk)
print(‘Done‘)
>>>
Phone number found:415-232-2354
Phone number found:415-234-2545
Done

　　解析：在for循環的每一次叠代中，取自message的一段新的12個字符被賦值給變量chunk，將chunk傳遞給isPhoneNumber()，看是否符合電話號碼的模式，如果符合就打印出這段文本。最終該循環遍歷整個字符串。

2、用正則表達式查找文本模式

　　正則表達式，簡稱regex，是文本模式的描述方法。比如 \d\d\d-\d\d\d-\d\d\d\d 可以匹配 3個數字，一個短線，3個數字，一個短線，4個數字，也就是上面說的電話號碼的格式。

　　在一個模式後面加上花括號包圍的數字，比如 {3} ，就是說匹配這個模式3次。所以上面的 \d\d\d-\d\d\d-\d\d\d\d 可以表示為 \d{3}-\d{3}-\d{4} 。

　　2.1 創建正則表達式對象

　　python中的所有正則表達式的函數都在re模塊中，使用前要先導入該模塊。

import re

　　向 re.compile() 傳入一個字符串，表示正則表達式，它將返回一個Regex模式對象。

phone_number_regex = re.compile(r‘\d{3}-\d{3}-\d{4}‘)

　　2.2 匹配Regex對象

　　Regex對象的search()方法查找傳入的字符串，尋找該正則表達式的所有匹配。如果沒有找到，則返回None；如果找到了，則返回一個Match對象。Match對象有一個group()方法，它返回被查找字符串中實際匹配的文本。

import re
#正則表達式返回全部文本
phone_number_regex = re.compile(r‘\d{3}-\d{3}-\d{4}‘)  #{3}表示這個表達式匹配3次
mo = phone_number_regex.search(‘My number is 515-555-5555‘)
print(‘Phone number found:‘+ mo.group())   #mo是一個Match對象，需要調用group()方法才會打印實際匹配到的文本
>>>
Phone number found:515-555-5555

　　2.3 正則表達式匹配小結

　　（1）用import re 導入正則表達式模塊。

　　（2）用re.compile() 函數創建一個Regex對象（要使用原始字符串）。

　　（3）向Regex對象的search()方法傳入想要查找的字符串，會返回一個Match對象。

　　（4）調用Match對象的group()方法，返回實際匹配文本的字符串。

3、用正則表達式匹配更多的模式

　　3.1 利用括號分組

　　添加括號將在正則表達式中創建分組沒然後可以利用group方法從一個分組中獲取匹配的文本。

　　在正則表達式中，第一對括號是第一組，第二對括號是第二組。向group傳入參數1或2，就可以取得匹配到的文本中的不同部分（分組），向group傳入參數0或者不傳入參數，將會取得匹配到的文本的全部內容。

import re
phone_number_regex = re.compile(r‘(\d{3})-(\d{3}-\d{4})‘)
mo = phone_number_regex.search(‘My number is 515-555-5555‘)
print(‘Phone number found:‘+ mo.group(1))
>>>Phone number found:515

　　如果想要一次獲得全部的分組，可以使用groups方法

import re
phone_number_regex = re.compile(r‘(\d{3})-(\d{3}-\d{4})‘)
mo = phone_number_regex.search(‘My number is 515-555-5555‘)
g_1 ,g_2 = mo.groups()
print(‘g_1:‘,g_1)
print(‘g_2:‘,g_2)
>>>
g_1: 515
g_2: 555-5555

　　因為括號在正則表達式中默認用於分組，所以想要匹配真正的括號時需要對括號進行轉義（兩邊都要轉義）

import re
phone_number_regex = re.compile(r‘(\(\d{3}\)) (\d{3}-\d{4})‘)
mo = phone_number_regex.search(‘My number is (515) 555-5555‘)
print(‘Phone number found:‘+ mo.group(1))
>>>
Phone number found:(515)

　　3.2 用管道匹配多個分組

　　希望匹配多個表達式中的一個時，可以使用管道符 ‘|’ ，如果希望匹配的文本都出現在了被查找的字符串中，那麽第一次出現的匹配文本將會作為Match對象被返回。

import re
hero_regex = re.compile(r‘Batman|Tina Fey‘)
mo = hero_regex.search(‘Batman and Tina Fey‘)
print(mo.group())
>>>
Batman

　　也可以通過指定前綴和管道符組合，實現多個表達式的匹配。

import re
bat_regex = re.compile(r‘Bat(man|mobile|copter|bat)‘) #Bat是前綴，與括號中內容進行組合。
mo1 = bat_regex.search(‘Batmobile lost a wheel‘)
print(mo1.group())  #返回完全匹配的文本
print(mo1.group(1))   #返回括號分組內匹配的文本
>>>
Batmobile
mobile

　　3.3 用問號實現可選匹配

　　字符？表示它前面的表達式或分組在這個模式中是可選的。但是？只匹配零次或一次。

import re
bat_Regex = re.compile(r‘Bat(wo)?man‘)
mo_1 = bat_Regex.search(‘The Adventures of Batman‘)
mo_2 = bat_Regex.search(‘The Adventures of Batwoman‘)
mo_3 = bat_Regex.search(‘The Adventures of Batwowoman‘)
print(mo_1.group())
print(mo_2.group())
print(mo_3)
>>>
Batman
Batwoman
None

　　3.4 用星號 * 匹配零次或多次

　　星號 * 的匹配方式與？有所不同，星號 * 可以匹配多次即只要存在就可以匹配。

import re
bat_Regex = re.compile(r‘Bat(wo)*man‘)
mo_1 = bat_Regex.search(‘The Adventures of Batwowowoman‘)
print(mo_1.group())
>>>
Batwowowoman

　　3.5 用加號 + 匹配一次或多次（至少匹配到一次）

import re
bat_Regex = re.compile(r‘Bat(wo)+man‘)
mo_1 = bat_Regex.search(‘The Adventures of Batman‘)
mo_2 = bat_Regex.search(‘The Adventures of Batwoman‘)
mo_3 = bat_Regex.search(‘The Adventures of Batwowoman‘)
print(mo_1)
print(mo_2.group())
print(mo_3.group())
>>>
None
Batwoman
Batwowoman

　　3.6 用花括號匹配特定的次數

　　如果想要一個分組重復特定次數，就在正則表達式中該分組的後面，跟上一個花括號，括號中的數字表示重復的次數。如

phone_number_regex = re.compile(r‘\d{3}-\d{3}-\d{4}‘)

　　除了制定一個數字，還可以制定一個範圍，即在花括號後面寫一個最小值a，一個最大值b，如{a,b}。

　　正則表達式{Ha}{3,5} 將匹配 ‘HaHaHa’、‘HaHaHaHa’ 和 ‘HaHaHaHaHa’ 。也可以省略第一個或者第二個數字，不限定最小值或最大值。

import re
ha_regex_1 = re.compile(r"(ha){3}")
m_1 = ha_regex_1.search(‘hahahaha‘)
print(m_1.group())
ha_regex_2 = re.compile(r‘(ha){3,5}‘)  #匹配3-5次，返回盡可能多的文本（貪心匹配）
m_2 = ha_regex_2.search(‘hahahahahaha‘)
print(m_2.group())
>>>
hahaha
hahahahaha

4、貪心和非貪心匹配

　　python的正則表達式默認是在有多種匹配結果的情況下，盡可能匹配最長的字符串。如果需要盡可能匹配最短的字符串，在花括號後面添加一個問號？即可

import re
ha_regex_1 = re.compile(r‘(ha){3,5}‘)  #匹配3-5次，返回盡可能多的文本（貪心匹配）
m_1 = ha_regex_1.search(‘hahahahahaha‘)
print(m_1.group())
ha_regex_2 = re.compile(r‘(ha){3,5}?‘)  #匹配3-5次，返回盡可能多的文本（貪心匹配）
m_2 = ha_regex_2.search(‘hahahahahaha‘)
print(m_2.group())
>>>
hahahahaha
hahaha

　　註意：問號在python中有兩種含義：一種是表示匹配可選分組，另一種是聲明非貪心匹配，要註意區分。

5、findall() 方法

　　search方法將返回被查找字符串中第一次匹配到的文本，而findall方法將返回被查找字符串中的匹配到的所有文本，findall返回的不是Match對象，而是一個字符串列表。

import re
phone_numbre_regex = re.compile(r‘\d{3}-\d{3}-\d{4}‘)
phone_number = phone_numbre_regex.findall(‘Cell:555-555-5555 Work:888-888-8888‘)
print(phone_number)
>>>
[‘555-555-5555‘, ‘888-888-8888‘]

　　如果正則表達式中有分組，則返回元組的列表。

import re
phone_numbre_regex = re.compile(r‘(\d{3})-(\d{3}-\d{4})‘)
phone_number = phone_numbre_regex.findall(‘Cell:555-555-5555 Work:888-888-8888‘)
print(phone_number)
>>>
[(‘555‘, ‘555-5555‘), (‘888‘, ‘888-8888‘)]

6、字符分類

　　常用字符分類的縮寫代碼

縮寫字符分類	表示
\d	0-9的任何數字
\D	除0-9數字以外的任何字符
\w	任何字母、數字、下劃線字符（可以認為是匹配‘單詞’字符）
\W	除字母、數字和下劃線以外的任何字符
\s	空格、制表符或換行符（可以認為是匹配‘空白’）字符
\S	除空格、制表符和換行符之外的任何字符

7、建立自己的字符分類

　　可以用方括號 [ ] 建立自己的字符分類，比如[aeiouAEIOU]將匹配所有的元音字母，不論大小寫。

import re
vowelregex = re.compile(r‘[aeiouAEIOU]‘)
str_1 = vowelregex.findall(‘hsoaiejdpoaJEP WGEPAEF JFOPAWJE[oeofjgjop jfgawpe‘)
print(str_1)

　　也可以使用短橫線表示字母或者數字的範圍，比如字符分類[0-9a-zA-Z]將匹配所有的數字和大小寫字母。

import re
vowelregex = re.compile(r‘[0-9a-zA-Z]‘)
str_1 = vowelregex.findall(‘hlsijrg894w4t23\.wsew213^&*%^&$79832hiu we‘)
print(str_1)

　　註意在方括號中，普通的正則表達式符號均代表原本的意義，不需要進行轉義。

　　通過在字符分類的左方括號右邊添加一個插入字符 ^ ，就可以得到“非字符類”，也就是說正則表達式將匹配不在這個字符分類中的字符。

import re
vowelregex = re.compile(r‘[^aeiouAEIOU]‘)
str_1 = vowelregex.findall(‘hsoaiej WGEPAEF JFJE[oeoop jfgawpe‘)
print(str_1)

8、插入字符和美元字符

　　可以在正則表達式的開始處使用插入字符^，表示匹配必須發生在被查找文本的開始處；

import re
begin = re.compile(r‘^hello‘)
answer_1 = begin.search(‘hello world‘)
answer_2 = begin.search(‘world hello‘)
print(answer_1.group())
print(answer_2)
>>>
hello
None

　　同樣，也可以在正則表達式的末尾添加一個美元字符$，表示該字符串必須以這個正則表達式的模式結束。

import re
begin = re.compile(r‘hello$‘)
answer_1 = begin.search(‘hello world‘)
answer_2 = begin.search(‘world hello‘)
print(answer_1)
print(answer_2.group())
>>>
None
hello

　　可以同時使用插入符和美元字符，表示整個字符串必須匹配該模式，而不是只匹配其中的子集。

import re
begin = re.compile(r‘^hello$‘)
answer_1 = begin.search(‘hello‘)
answer_2 = begin.search(‘world hello‘)
print(answer_1.group())
print(answer_2)
>>>
hello
None

9、通配字符

　　句點 . 被稱為“通配符” 。它匹配除換行符之外的所有字符。但是句點只能匹配一個字符，比如r‘.s‘ 只能匹配到 ‘us’，而不能匹配‘yours’。

import re
begin = re.compile(r‘.‘)
answer_1 = begin.search(‘hello‘)
print(answer_1.group())
>>>
h

　　9.1 用點-星（.*）匹配所有的字符串，點-星表示任意文本。

import re
begin = re.compile(r‘.*‘)
answer_1 = begin.search(‘hello‘)
print(answer_1.group())
>>>
hello

　　9.2 用句點字符匹配換行

　　如果想要用點-星匹配包括換行符在內的所有字符，可以通過給compile傳入第二個參數re.DOTALL實現。

import re
begin = re.compile(r‘.*‘,re.DOTALL)
answer_1 = begin.search(‘hello \nworld‘)
print(answer_1.group())
>>>
hello
world

10、正則表達式常用符號

?	匹配零次或一次前面的分組
*	匹配零次或多次前面的分組
+	匹配一次或多次前面的分組
{n}	匹配n次前面的分組
{n,}	匹配n次或更多次前面的分組
{,m}	匹配零次到m次前面的分組
{n,m}	匹配至少n次，至多m次前面的分組
{n,m}?或*？或+？	對前面的分組進行非貪心匹配
^spam	字符串必須以spam開始
spam$	字符串必須以spam結束
.	匹配除換行符之外的所有字符
\d,\w,\s	分別匹配數字、字母和空格
\D,\W,\S	分別匹配除數字、字母和空格之外的所有字符
[abc]	匹配方括號內的任意字符
[^abc]	匹配不在方括號內的任意字符

11、不區分大小寫的匹配

　　一般來說，正則表達式是區分大小寫的，如果想要正則表達式不區分大小寫，可以向re.compile()中傳入re.I作為第二個參數。

import re
begin = re.compile(r‘[a-z]*‘,re.I)
answer_1 = begin.search(‘ahisdhfliNILSHILHAI‘)
print(answer_1.group())
>>>
ahisdhfliNILSHILHAI

12、用sub()方法替換字符串

　　regex對象的sub方法需要傳入兩個參數，第一個參數是一個字符串，用於取代發現的匹配。第二個參數是一個字符串，是用正則表達式匹配的內容。sub方法返回替換完成後的字符串。

import re
name_regex = re.compile(r‘world‘)
answer_1= name_regex.sub(‘小姐姐‘,‘hello world‘)
print(answer_1)
>>>
hello 小姐姐

13、管理復雜的正則表達式

　　通常正則表達式寫成一段會很難閱讀，所以可以將正則表達式寫成多行字符串的形式，用三個引號括起來，同時向re.compile()中傳入參數re.VERBOSE 告訴compile 忽略表達式中的空白符和註釋。

import re
name_regex = re.compile(r‘‘‘
                        (\d{3})  #匹配三個數字
                        (.{3})  #匹配除換行符之外的任意內容三次
                        (\w{3})  #匹配三個字母
                        ‘‘‘,re.VERBOSE)
answer_1= name_regex.search(‘555w*whello world‘)
print(answer_1.group())
>>>
555w*whel

14、組合使用re.I 、re.DOTALL、re.VERBOSE

　　re.compile()只接受兩個參數，如果希望在使用re.I忽略大小寫時，同時讓句點可以匹配換行符，或者可以在表達式中添加註釋，可以用管道符將這些參數合並起來。

regex = re.compile(r‘[a-z]‘,re.I|re.DOTALL|re.VERBOSE)

import re
name_regex = re.compile(r‘‘‘
                        ([a-z]*)  #匹配任意多個字母，不區分大小寫
                        (\n)   #匹配一個換行符
                        ([0-9]*)   # 匹配任意多個數字
                        ‘‘‘,re.I|re.DOTALL|re.VERBOSE)
answer_1= name_regex.search(‘abcdABCD\n8097809‘)
print(answer_1.group())
>>>
abcdABCD
8097809

python3.5學習筆記（第六章）

原本空白制表符 false sdh office 元組所有 n) 本章內容：　　正則表達式詳解（re模塊） 1、不使用正則表達式來查找文本的內容　　要求從一個字符串中查找電話號碼，並判斷是否匹配制定的模式，如：555-555-5555。傳統的查找方法如下：

python3.5學習筆記（第六章）

python3.5學習筆記（第六章）

python 3.5學習筆記（第四章）

Java-Web學習筆記（第六章）

java-web學習筆記（第五章）

java-web學習筆記（第三章）

java-web學習筆記（第四章）

Java-Web學習筆記（第八章）

Java-Web學習筆記（第九章）

鳥哥私房菜學習筆記（第零章）

(Flask Web開發:基於Python的Web應用開發實戰)------學習筆記（第2章）

Linux核心設計與實現總結筆記（第六章）核心資料結構

javascript高階程式設計學習筆記（第四章）

JAVA語言學習筆記（第六週）

python 資料分析學習筆記（第三章）

深入解析Oracle學習筆記（第九章）

第六七章學習體會-----（第六次）

《深入理解Java虛擬機》學習筆記（第三章垃圾收集器與內存分配策略）

Introduction to 3D Game Programming with DirectX 12 學習筆記之 --- 第六章：在Direct3D中繪製

javascript學習筆記（第三章DOM--獲取元素屬性值）

HTTP圖解讀書筆記（第六章 HTTP首部）為cookie服務的首部欄位和其它首部欄位

python3.5學習筆記（第六章）

相關推薦