Python-字符串解析-正則-re
正則表達式
特殊字符序列,匹配檢索和替換文本
普通字符 + 特殊字符 + 數量,普通字符用來定邊界
更改字符思路
字符串函數 > 正則 > for循環
元字符 匹配一個字符
# 元字符大寫,一般都是取小寫的反
1. 0~9 整數 \d 取反 \D
import re example_str = "Beautiful is better than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r"\d", example_str)) print(re.findall(r"\D", example_str))
2. 字母、數字、下劃線 \w 取反 \W
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘\w‘, example_str)) print(re.findall(r‘\W‘, example_str))
3. 空白字符(空格、\t、\t、\n) \s 取反 \S
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘\s‘, example_str)) print(re.findall(r‘\S‘, example_str))
4. 字符集中出現任意一個 [] 0-9 a-z A-Z 取反 [^]
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘[0-9]‘, example_str)) print(re.findall(r‘[^0-9]‘, example_str))
5. 除 \n 之外任意字符
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r".", example_str))
數量詞 指定前面一個字符出現次數
1. 貪婪和非貪婪
a. 默認情況下是貪婪匹配,盡可能最大匹配直至某個字符不滿足條件才會停止(最大滿足匹配)
b. 非貪婪匹配, 在數量詞後面加上 ? ,最小滿足匹配
c. 貪婪和非貪婪的使用,是程序引起bug重大原因
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘.*u‘, example_str)) print(re.findall(r‘.*?u‘, example_str))
2. 重復指定次數 {n} {n, m}
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘\d{3}‘, example_str))
3. 0次和無限多次 *
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘.*‘, example_str))
4. 1次和無限多次 +
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘\d+‘, example_str))
5. 0次或1次 ? 使用思路: 去重
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘7896?‘, example_str))
邊界匹配
1. 從字符串開頭匹配 ^
2. 從字符串結尾匹配 $
正則表達式或關系 |
滿足 | 左邊或者右邊的正則表達式
import re example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit" print(re.findall(r‘\d+|\w+‘, example_str))
組
() 括號內的正則表達式當作單個字符,並且返回()內正則匹配的內容,可以多個,與關系
Python-正則相關模塊-re
1. 從字符中找到匹配正則的字符 findall()
import re name = "Hello Python 3.7, 123456789" total = re.findall(r"\d+", name) print(total)
2. 替換正則匹配者字符串 sub()
import re def replace(value): return str(int(value.group()) + 1) result_str = re.sub(r"\d", replace, name, 0) print(result_str)
匹配一個中文字符 [\u4E00-\u9FA5]
Python-字符串解析-正則-re