1. 程式人生 > >爬蟲代理方法的使用

爬蟲代理方法的使用

html 對象創建 結果 ron webp ebp exc exceptio 協議

1.urllib模塊

設置代理的demo代碼如下:

from urllib.error import URLError
from urllib import request

user_agent = rMozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87
# Keep-Alive功能使客戶端到服務器端的連接持續有效
headers = {User-Agent: user_agent}
proxy = 127.0.0.1:9734
proxy_handler 
= request.ProxyHandler({ http: http:// + proxy, https: http:// + proxy }) opener = request.build_opener(proxy_handler) try: req = request.Request(http://httpbin.org/get,headers=headers) resp = opener.open(req) print(resp.read().decode(utf-8)) except Exception as e: print(e) 運行結果如下圖所示: {
"args": {}, "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9", "Connection": "close", "Host": "httpbin.org", "Upgrade-Insecure-Requests"
: "1", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36" }, "origin": "111.19.38.245", "url": "http://httpbin.org/get" }

在上面的代碼中,先借助request的ProxyHandler方法設置代理,參數是字典類型,鍵的名稱是協議類型,值是代理。需要註意的是代理值的前面要加上協議(http或者https)。當請求時http時,調用http代理,是HTTPS時調用https代理。

創建完ProxyHandler代理後,就可以用request.build_opener方法傳入該對象創建一個opener,這一步完成後代理就創建好了。之後就可以訪問我們需要訪問的鏈接。

 
2.代理認證
如果遇到需要認證的代理,我們可以使用如下方法設置代理。
proxy = username:[email protected]:9734  # 代理認證

proxy_handler = request.ProxyHandler({

    http: http:// + proxy,

    https: http:// + proxy

})
3.代理是socks5類型
代理是socks5類型,使用如下類型。
from urllib.error import URLError
from urllib import request
import socks,socket

user_agent = rMozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87
# Keep-Alive功能使客戶端到服務器端的連接持續有效
headers = {User-Agent: user_agent}
socks.set_default_proxy(socks.SOCKS5,127.0.0.1,9742)
socket.socket = socks.socksocket
try:
    req = request.Request(http://httpbin.org/get,headers=headers)
    resp = request.urlopen(req)
    print(resp.read().decode(utf-8))
except Exception as e:
    print(e)

4.requests方法設置代理

對於requests方法來說,設置代理的方法比較簡單,只需要將代理傳入proxies參數就可以了。其他的socks5和代理認證的方法和上面一樣。代碼如下:

import requests

user_agent = rMozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87
# Keep-Alive功能使客戶端到服務器端的連接持續有效
headers = {User-Agent: user_agent}
proxy = 127.0.0.1:9734
proxy_handler = {
    http: http:// + proxy,
    https: http:// + proxy
}
try:
    req = requests.get(http://httpbin.org/get,proxies = proxy_handler,headers=headers)
    print(req.text)
except Exception as e:
    print(e)

爬蟲代理方法的使用