爬取糗事百科圖片,(截止至2016/10/23可用)
阿新 • • 發佈:2019-02-03
區分開頭像和圖片所在資料夾就好
頭像
<div class="article block untagged mb15" id='qiushi_tag_117810314'> <div class="author clearfix"> <a href="/users/22028925/" target="_blank" rel="nofollow"> <img src="http://pic.qiushibaike.com/system/avtnew/2202/22028925/medium/2016100101212195.JPEG" alt="紅顏一笑醉心絃~"/> </a> <a href="/users/22028925/" target="_blank" title="紅顏一笑醉心絃~"> <h2>紅顏一笑醉心絃~</h2> </a> <div class="articleGender manIcon">99</div> </div>
真正的圖
<div class="thumb">
<a href="/article/117810314" target="_blank">
<img src="http://pic.qiushibaike.com/system/pictures/11781/117810314/medium/app117810314.jpg" alt="隔著螢幕都聽到它沉重的喘氣聲" />
</a>
</div>
一個是avtnew,一個是pictures,正則即可(我寫的比較搓)
from urllib.request import Request,urlopen ,urlretrieve from bs4 import BeautifulSoup import re import os H = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'} url = "http://www.qiushibaike.com/imgrank/page/5/?s=4922922" req = Request(url=url,headers=H) html = urlopen(req) src = BeautifulSoup(html,"html.parser") a = src.findAll("img",{"src":re.compile("http:\/\/pic\.qiushibaike\.com\/system\/pictures.*\.jpg")}) #建立資料夾 dir = os.getcwd()+"\\pic" if not os.path.exists(dir): os.makedirs(dir) x = 1 for i in a: path = i["src"] urlretrieve(path,dir+'\\%s.jpg'%x)#下載 x+=1