1. 程式人生 > >爬小說(第一次編寫爬蟲)

爬小說(第一次編寫爬蟲)

prop txt apt quest log res port amp 下載

 1 import requests
 2 import re
 3 html = http://www.jingcaiyuedu.com/book/317834.html
 4 response = requests.get(html)
 5 ‘‘‘while(str(response)!="<Response [200]>"):
 6     response = requests.get(html)
 7     print(response)
 8 ‘‘‘
 9 response.encoding = utf-8
10 html = response.text
11 title =re.findall(r
<meta property="og:novel:book_name" content="(.*?)"/>,html)[0] 12 dl = re.findall(r<dl id="list">.*?</dl>,html,re.S)[0] 13 chapter_info_list = re.findall(rhref="(.*?)">(.*?)<,dl) 14 fb = open ("%s.txt"%title,"w",encoding = "utf-8") 15 for chapter_info in chapter_info_list: 16
chapter_url,chapter_title = chapter_info 17 18 chapter_url =http://www.jingcaiyuedu.com%s % chapter_url 19 chapter_response = requests.get(chapter_url) 20 chapter_response.encoding = utf-8 21 chapter_html = chapter_response.text 22 chapter_content = re.findall(r<script>a1\(\);</script>(.*?)<script>a2\(\);</script>
,chapter_html,re.S)[0] 23 chapter_content = chapter_content.replace(<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;,‘‘) 24 chapter_content = chapter_content.replace(&nbsp;,‘‘) 25 chapter_content = chapter_content.replace( ,‘‘) 26 fb.write(chapter_title) 27 fb.write(chapter_content) 28 fb.write(\n) 29 print(chapter_url) 30 31 #print(chapter_info_list)

第一次使用爬蟲,python的功能由衷的強大,不過遭遇的運程主機的強行關閉,基本只能下載前幾章就會遭遇強行關閉,下一價段爭取解決

爬小說(第一次編寫爬蟲)