1. 程式人生 > >爬蟲實戰2:爬出頭條網美圖

爬蟲實戰2:爬出頭條網美圖

open exists read 地址 bsp lose col new 頭條

完整代碼經測試可成功運行,目的是抓取頭條網輸入街拍後的圖片

有幾個問題點不明白

1. 查看頭信息,參數表和代碼中params有些不同,不知道代碼中的參數是怎麽來的

offset: 40
format: json
keyword: 街拍
autoload: true
count: 20
cur_tab: 1
from: search_tab

2. 第51行代碼不明白

new_image_url = local_image_url.replace(‘list‘,‘large‘)

說明:經驗證,//p3.pstatp.com/list/pgc-image/153077042661778985015b6這個地址是返回一個圖片的縮小版,如果把地址中的list改為large就能變成大圖。

完整代碼如下

 1 import os
 2 import requests
 3 from urllib.parse import urlencode
 4 from hashlib import md5
 5 from multiprocessing.pool import Pool
 6 
 7 GROUP_START = 1
 8 GROUP_END = 5
 9 
10 
11 def get_page(offset):
12     params = {
13         offset: offset,
14         format: json,
15
keyword: 街拍, 16 autoload: true, 17 count: 20, 18 cur_tab: 3, 19 from: gallery, 20 } 21 url = https://www.toutiao.com/search_content/? + urlencode(params) 22 try: 23 response = requests.get(url) 24 if response.status_code == 200:
25 return response.json() #調用json方法,將結果轉為json格式 26 except requests.ConnectionError: 27 return None 28 29 30 def get_images(json): 31 data = json.get(data) 32 if data: 33 for item in data: 34 # print(item) 35 image_list = item.get(image_list) 36 title = item.get(title) 37 # print(image_list) 38 if image_list: 39 for image in image_list: 40 yield { 41 image: image.get(url), 42 title: title 43 } 44 45 46 def save_image(item): 47 if not os.path.exists(item.get(title)): 48 os.mkdir(item.get(title)) 49 try: 50 local_image_url = item.get(image) 51 new_image_url = local_image_url.replace(list,large) 52 response = requests.get(http: + new_image_url) 53 if response.status_code == 200: 54 file_path = {0}/{1}.{2}.format(item.get(title), md5(response.content).hexdigest(), jpg) 55 if not os.path.exists(file_path): 56 with open(file_path, wb)as f: 57 f.write(response.content) 58 else: 59 print(Already Downloaded, file_path) 60 except requests.ConnectionError: 61 print(Failed to save image) 62 63 64 def main(offset): 65 json = get_page(offset) 66 for item in get_images(json): 67 print(item) 68 save_image(item) 69 70 71 if __name__ == __main__: 72 pool = Pool() 73 groups = ([x * 20 for x in range(GROUP_START, GROUP_END + 1)]) 74 pool.map(main, groups) 75 pool.close() 76 pool.join()

爬蟲實戰2:爬出頭條網美圖