python requests抓取貓眼電影
阿新 • • 發佈:2017-10-12
def res b- int nic status () tle proc
1. 網址:http://maoyan.com/board/4?
2. 代碼:
1 import json 2 from multiprocessing import Pool 3 import requests 4 from requests.exceptions import RequestException 5 import re 6 7 8 def get_one_page_html(url): 9 try: 10 response = requests.get(url) 11 if response.status_code == 200:View Code12 return response.text 13 return None 14 except RequestException: 15 return None 16 17 def parse_one_page(html): 18 pattern = re.compile(‘<dd>.*?board-index.*?>(\d+)</i>.*?alt.*?src="(.*?)".*?name"><a‘ 19 +‘.*?>(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p>‘ 20 +‘.*?integer">(.*?)</i>.*?fraction">(.*?)</i>.*?</dd>‘, re.S)# .可以匹配任意的換行符 21 22 items = re.findall(pattern,html) 23 #(‘1‘, ‘http://p1.meituan.net/movie/[email protected]_220h_1e_1c‘, ‘霸王別姬‘, ‘\n 主演:張國榮,張豐毅,鞏俐\n ‘, ‘上映時間:1993-01-01(中國香港)‘, ‘9.‘, ‘6‘),24 for item in items: 25 yield { 26 ‘index‘ : item[0], 27 ‘image‘ : item[1], 28 ‘title‘:item[2], 29 ‘actor‘ : item[3].strip()[3:], 30 ‘time‘: item[4].strip()[5:], 31 ‘score‘ : item[5] + item[6] 32 } 33 34 def write_to_file(content): 35 with open(‘result.txt‘, ‘a‘, encoding=‘utf-8‘)as f: 36 f.write(json.dumps(content, ensure_ascii=False) + ‘\n‘)#導入快捷見alt+enter,content內容是個字典,我們要把它變成字符串寫入文件,加入換行符,每行一個 37 f.close() 38 39 def main(offset): 40 url = ‘http://maoyan.com/board/4?offset=‘ + str(offset) 41 html = get_one_page_html(url) 42 for item in parse_one_page(html): 43 print(item) 44 write_to_file(item) #會變成unicode編碼,若想result.txt裏面是中文,需要修改write_to_file函數,加上encoding=‘utf-8’和ensure_ascii=False 45 46 if __name__ == ‘__main__‘: 47 # for i in range(10): 48 # main(i*10) 49 50 pool = Pool() 51 pool.map(main, [i*10 for i in range(10)])
3. 結果:
註意:
1.正則匹配要好好看看
2.將輸出的內容格式化,變成一個生成器字典
3.寫到文件的時候把unicode編碼變成中文顯示
4.進程池Pool。實現秒抓
python requests抓取貓眼電影