scrapy中間鍵如何使用代理IP和使用者代理
阿新 • • 發佈:2018-12-06
1.middleware.py中程式碼
class IPPOOlS(HttpProxyMiddleware): def __init__(self, ip=''): self.ip = ip # 請求處理 # 先隨機選擇一個IP def process_request(self, request, spider): thisip = random.choice(IPPOOL) print("當前使用IP是:" + thisip["ipaddr"]) request.meta["proxy"] = "http://" + thisip["ipaddr"]
2.建立uamind.py檔案(和middleware同一路徑)
# -*- coding: utf-8 -*-# # 匯入隨機模組 import random # 匯入settings檔案中的UPPOOL from .settings import UPPOOL # 匯入官方文件對應的HttpProxyMiddleware from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware class Uamid(UserAgentMiddleware):# 初始化 注意一定要user_agent,不然容易報錯 def __init__(self, user_agent=''): self.user_agent = user_agent def process_request(self, request, spider):# 先隨機選擇一個使用者代理 thisua = random.choice(UPPOOL) print("當前使用User-Agent是:"+thisua) request.headers.setdefault('User-Agent',thisua)
3.setting中程式碼
#======================================== # 設定IP池和使用者代理 # 禁止本地Cookie COOKIES_ENABLED = False # 設定IP池 IPPOOL = [ {"ipaddr": "117.191.11.77:8080"}, {"ipaddr": "211.159.140.133:8080"}, {"ipaddr": "211.159.140.111:8080"}, {"ipaddr": "112.175.32.88:8080"}, ] # 設定使用者代理池 UPPOOL = [ "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393" ] #啟動中間鍵 DOWNLOADER_MIDDLEWARES = { 'company_cotacts.middlewares.IPPOOlS': 2, 'company_cotacts.uamind.Uamid':1
}
我們在使用使用者代理時,還有一種簡單的方式:
1.在scrapy環境下安裝fake_useragent包
2.使用以下fake_useragent中UserAgent模組
from fake_useragent import UserAgent
3.在setting中修改程式碼如下:
#USER_AGENT = ‘company_cotacts (+http://www.yourdomain.com)’
ua = UserAgent().random
USER_AGENT = ua