Python基礎day-13[模塊:re,subprocess未完]
阿新 • • 發佈:2017-06-27
str exe nbsp ati req 滿足 return tin for
re(續):
re默認是貪婪模式。
貪婪模式:在滿足匹配時,匹配盡可能長的字符串。
import re s = ‘askldlaksdabccccccccasdabcccalsdacbcccacbcccabccc‘ res = re.findall(‘abc+‘,s) print(res) res = re.findall(‘abc+?‘,s) #在規則後面加?來取消貪婪模式。 print(res) 執行結果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py [‘abcccccccc‘, ‘abccc‘, ‘abccc‘] [‘abc‘, ‘abc‘, ‘abc‘] Process finished with exit code 0
re的模塊的常用方式:
re.split(): 類似字符串的split命令但是比 字符串的split 更強大。
import re s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘ res = re.split(‘\d‘,s) print(res) res = re.split(‘(\d+)‘,s) #加()來保留分割符 print(res) 執行結果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py [‘askldlaksdab‘, ‘ccccc.cccas‘, ‘dabc‘, ‘cc.alsdacbcccac.cccab‘, ‘ccc‘] [‘askldlaksdab‘, ‘8‘, ‘ccccc.cccas‘, ‘8‘, ‘dabc‘, ‘8‘, ‘cc.alsdacbcccac.cccab‘, ‘8‘, ‘ccc‘] Process finished with exit code 0
re.sub():類似replace 替換操作。
import re s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘ res = re.sub(‘abc+‘,‘123‘,s) print(res) 執行結果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py askldlaksdab8ccccc.cccas8d1238cc.alsdacbcccac.cccab8ccc Process finished with exit code 0
re.compile():編譯
import re s = ‘askldlaksdab8ccccc.cccas8dabc8cc.alsdacbcccac.cccab8ccc‘ obj = re.compile(‘\d+‘) #定義一個對象對應的編譯規則 res = obj.findall(s) #調用處理 print(res) 執行結果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py [‘8‘, ‘8‘, ‘8‘, ‘8‘] Process finished with exit code 0
一個小爬蟲正則練習(爬校花網)
import requests,re,json url = ‘http://www.xiaohuar.com/2014.html‘ #校花排行榜top120 def req(): req_str = requests.get(url) # print(‘encoding‘,req_str.encoding) return req_str.text def run(): html = req() html = html.encode(‘Latin-1‘).decode(‘gbk‘) # print(html) obj = re.compile(‘<div class="top-title">(.*?)</div>.*?<div class="title">.*?target="_blank">(.*?)</a></span></div>‘,re.S) #匹配top排名序號和姓名學校 res = obj.findall(html) return res dic = {} res = run() for x in res: dic[x[0]]=x[1] data = json.dumps(dic) #序列化 with open(‘xiaohua.json‘,‘a‘,encoding=‘utf-8‘) as f: f.write(data) with open(‘xiaohua.json‘, ‘r‘, encoding=‘utf-8‘) as f: data = json.load(f) #反序列化 print(data)
subprocess:
subprocess模塊允許一個進程創建一個新的子進程,通過管道連接到子進程的stdin/stdout/stderr,獲取子進程的返回值等操作。
import subprocess s = subprocess.Popen(‘dir‘,shell=True,stdout=subprocess.PIPE) print(s.stdout.read().decode(‘gbk‘)) 執行結果: D:\Python\Python36-32\python.exe E:/Python/DAY-15/3213.py 驅動器 E 中的卷沒有標簽。 卷的序列號是 383D-453A E:\Python\DAY-15 的目錄 2017/06/27 19:52 <DIR> . 2017/06/27 19:52 <DIR> .. 2017/06/27 19:52 338 3213.py 2017/06/27 19:47 778 tmp.py 2017/06/27 19:25 9,146 xiaohua.json 3 個文件 10,262 字節 2 個目錄 117,877,260,288 可用字節 Process finished with exit code 0
Python基礎day-13[模塊:re,subprocess未完]