Requests 校花網圖片爬取
阿新 • • 發佈:2018-12-11
紀念我們鬧過的矛盾import requestsimport reurl = 'http://www.xiaohuar.com/list-1-%s.html'for i in range(4): temp = url % i response =requests.get(temp) html = response.text #img_urls=re.findall(r"/d/file/\d+/\w+\.jpg",html)#取出圖片URL #img_urls1 = re.findall(r"https://\w+.*?/\w+/\w+/\w+/\w+/\d+/\w+.*\.jpg", html)#取出圖片URL #img_names = re.findall(r'<img \w+.*="\d+".*? alt="(.*?)"', html) #取出圖片名稱 img=re.findall(r'<img \w+.*="\d+".*? alt="(.*?)".*"(/d/file/\d+/\w+\.jpg)"', html) for img_tupian in img: img_tupian_urls=img_tupian[-1]#取出圖片Url img_name=img_tupian[0]#取出名稱 img_response=requests.get("http://www.xiaohuar.com%s" %img_tupian_urls)
xiaohua=img_response.content name=("http://www.xiaohuar.com%s" %img_tupian_urls).split('/')[-1] print(houzui) with open(img_name +name,'wb') as f: f.write(xiaohua)
爬去結果還是有點不好,有空想想把那些亂碼去掉