1. 程式人生 > >Python爬蟲(三)爬淘寶MM圖片

Python爬蟲(三)爬淘寶MM圖片

name os.path app dir util mozilla user mac baseurl

直接上代碼:

# python2
# -*- coding: utf-8 -*-

import urllib2
import re
import string
import os
import shutil

def crawl_taobaoMM(baseUrl, start, end):
    imgDir = mm_img
    isImgDirExist = os.path.exists(imgDir)
    if not isImgDirExist:
        os.makedirs(imgDir)
    else:
        shutil.rmtree(imgDir)

    fileName 
= mm.txt picNumber = 0 with open(fileName, a) as f: for i in range(start, end + 1): url = baseUrl + ?page= + str(i) userAgent = Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
headers = {user-agent: userAgent} req = urllib2.Request(url, headers=headers) response = urllib2.urlopen(req).read().decode(gbk) # 圖片url、姓名、年齡、城市、職業 serchPattern = r<div class="personal-info">.*?<img src="//(.*?)".*?<a class="lady-name".*?>(.*?)
r</a>.*?<strong>(.*?)</strong>.*?<span>(.*?)</span>.*?<em>(.*?)</em> searchObj = re.compile(serchPattern, re.S) results = searchObj.findall(response) print + str(i) + 頁... for result in results: message = %s %s %s %s %s\n % (result[0], result[1], result[2], result[3], result[4]) print picNumber print message f.write(message.encode(utf-8)) pic = urllib2.urlopen(https:// + result[0]).read() picName = imgDir + / + string.zfill(picNumber, 5) + .jpg with open(picName, wb) as pf: pf.write(pic) picNumber += 1 crawl_taobaoMM(https://mm.taobao.com/json/request_top_list.htm, 1, 10)

爬下來的圖片:

技術分享

參考資料:

Python爬蟲實戰四之抓取淘寶MM照片

Python爬蟲(三)爬淘寶MM圖片