1. 程式人生 > >Python爬蟲.修改請求頭Header(學習筆記)

Python爬蟲.修改請求頭Header(學習筆記)

HTTP請求頭是我們每次向網路伺服器傳送請求時,想其傳遞的一組屬性和配置資訊。一下為七中常用的請求頭:

1.Host

2.Connection

3.Accept

4.User-Agent

5.Referrer

6.Accept-Encoding

7.Accept-Language

我們可以上https://www.whatismybrowser.com/detect/what-http-headers-is-my-browser-sending來檢視自己瀏覽器的請求頭資訊(網站還有其他關於瀏覽器的資訊)。另外,我們可以利用requests模組修改自己的請求頭,並用這個網站來測試自己的爬蟲程式碼的請求頭:

import requests
from bs4 import BeautifulSoup

session = requests.Session()
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit 537.36 (KHTML, like Gecko) Chrome",
"Accept": "text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,*/*;q=0.8"}

url = "https://www.whatismybrowser.com/detect/what-http-headers-is-my-browser-sending"
req = session.get(url, headers=headers)

bsObj = BeautifulSoup(req.text)
print(bsObj)
print(bsObj.find("table", {"class": "table table-striped"}).get_text())

執行結果:


ACCEPT
text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,*/*;q=0.8

ACCEPT_ENCODING
gzip, deflate

CONNECTION
keep-alive

HOST
www.whatismybrowser.com

USER_AGENT
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit 537.36 (KHTML, like Gecko) Chrome

內容來自《Python網路資料採集》Web Scrapy with Python. by Ryan Mitchell