1. 程式人生 > >Requests 校花網圖片爬取

Requests 校花網圖片爬取

紀念我們鬧過的矛盾import requestsimport reurl = 'http://www.xiaohuar.com/list-1-%s.html'for i in range(4):    temp = url % i    response =requests.get(temp)    html = response.text    #img_urls=re.findall(r"/d/file/\d+/\w+\.jpg",html)#取出圖片URL    #img_urls1 = re.findall(r"https://\w+.*?/\w+/\w+/\w+/\w+/\d+/\w+.*\.jpg", html)#取出圖片URL    #img_names = re.findall(r'<img \w+.*="\d+".*? alt="(.*?)"', html) #取出圖片名稱    img=re.findall(r'<img \w+.*="\d+".*? alt="(.*?)".*"(/d/file/\d+/\w+\.jpg)"', html)    for img_tupian in img:        img_tupian_urls=img_tupian[-1]#取出圖片Url        img_name=img_tupian[0]#取出名稱                img_response=requests.get("http://www.xiaohuar.com%s" %img_tupian_urls)
        xiaohua=img_response.content        name=("http://www.xiaohuar.com%s" %img_tupian_urls).split('/')[-1]        print(houzui)        with open(img_name +name,'wb') as f:            f.write(xiaohua)
 爬去結果還是有點不好,有空想想把那些亂碼去掉