【Python學習】request庫

阿新 • • 發佈：2018-12-10

.html pri less tps python-r mac os part 絕對路徑 pytho

Requests庫(https://www.python-requests.org/)是一個擅長處理那些復雜的HTTP請求、cookie、header（響應頭和請求頭）等內容的Python第三方庫。

提交一個最基本的表單

大多數網頁表單都是由一些HTML字段、一個提交按鈕、一個在表單處理完之後跳轉的“執行結果”（表單屬性action的值）頁面構成。

一個最簡單的表單(http://www.pythonscraping.com/pages/files/form.html)

技術分享圖片

這個表單的源碼在下面。可以通過chrome的開發者工具（F12）查看。

<form method="post" action="processing.php">
First name: <input type="text" name="firstname"><br>
Last name: <input type="text" name="lastname"><br>
<input type="submit" value="Submit" id="submit">
</form>

有幾個要點：

兩個要輸入字段的名稱是firstname和lastname。字段的名稱決定了表單被確認後要被傳送到服務器上的變量名稱，要模擬表單提交數據的行為，就要保證變量名稱與字段名稱是一一對應的。
表單的真實行為其實發生在processing.php（絕對路徑是http://www.pythonscraping.com/pages/files/processing.php）。表單的任何POST請求其實都發生在這個頁面上，並非表單本身所在的頁面。HTML表單的目的，知識幫助網站的訪問者發送格式合理的請求，向服務器請求沒有出現的頁面。

那麽提交這個最簡單的表單，只要四行代碼就可以了。

import requests

params = {‘firstname‘: ‘Ivy‘, ‘lastname‘: ‘Wong‘}
r= requests.post("http://www.pythonscraping.com/pages/files/processing.php", data=params)
print(r.text)

表單提交後，程序應該會返回執行頁面的源代碼，包括這行內容。

技術分享圖片

提交文件和圖像

在http://www.pythonscraping.com/files/form2.html有一個文件上傳表單，表單的源代碼是下面這樣的。

<form action="../pages/files/processing2.php" method="post" enctype="multipart/form-data">
  Submit a jpg, png, or gif: <input type="file" name="uploadFile"><br>
  <input type="submit" value="Upload File">
</form>

發現input標簽裏有一個type屬性是file，和文字其實差不多。

import requests

filess = {‘uploadFile‘: open(‘..files/Python-logo.png‘,‘rb‘)}
r= requests.post("http://www.pythonscraping.com/pages/files/processing2.php", files=files)
print(r.text)

處理登錄與cookie

大多數新式的網站都用cookies跟蹤用戶是否已登錄的狀態信息。一旦網站驗證了你的登錄權證，它就會將它們保存在你的瀏覽器的cookie中，裏面通常包含一個服務器生產的令牌、登錄有效時限和狀態跟蹤信息。網站會把這個cookie當作信息驗證的證據，在你瀏覽網站的每個頁面時出示給服務器。

Ryan Mitchell在http://www.pythonscraping.com/pages/cookies/login.html創建了一個簡單的登錄表單。

技術分享圖片

用戶名可以是任意值，但是密碼必須是"password"。

這個表單在歡迎頁面（http://www.pythonscraping.com/pages/cookies/welcome.php）處理，裏面包含一個簡介界面：http://www.pythonscraping.com/pages/cookies/profile.php。

在簡介頁面，網站會監測瀏覽器的cookie，看它有沒有頁面已登錄的設置信息。

import requests

params = {‘username‘:‘Ryan‘,‘password‘:‘password‘}
r=requests.post("http://www.pythonscraping.com/pages/cookies/welcome.php",params)
print("Cookie is set to:")
print(r.cookies.get_dict())
print("-----------------------")
print("Going to profile page...")
r=requests.get("http://www.pythonscraping.com/pages/cookies/profile.php",cookies=r.cookies)
print(r.text)

有些網站比較復雜，cookie經常暗自調整。那麽可以用session函數。

import requests

session = requests.Session()

params = {‘username‘:‘Ryan‘,‘password‘:‘password‘}
r=session.post("http://www.pythonscraping.com/pages/cookies/welcome.php",params)
print("Cookie is set to:")
print(r.cookies.get_dict())
print("-----------------------")
print("Going to profile page...")
r=session.get("http://www.pythonscraping.com/pages/cookies/profile.php")
print(r.text)

會話(session)對象會持續跟蹤會話信息，比如cookie、header，甚至包括運行HTTP協議的信息，比如HTTPAdapter。

修改請求頭

HTTP的請求頭是在你每次向網絡服務器發送請求時，傳遞的一組屬性和配置信息。HTTP定義了十幾種古怪的請求頭類型，不過大多數都不常用。只有下面的七個字段被大多數瀏覽器用來初始化所有網絡請求。（表內是我的瀏覽器數據）

屬性	內容
Host	hpd.baidu.com
Connection	keep-alive
Accept	image/webp,image/apng,image/,/*;q=0.8
User-Agent	Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Referrer	https://www.baidu.com/
Accept-Encoding	gzip, deflate, br
Accept-Language	zh-CN,zh;q=0.9

而經典的Python爬蟲在使用urllib標準庫時，都會發送如下的請求頭：

屬性	內容
Accept-Encoding	indentity
User-Agent	Python-urllib/3.4

http://www.whatismybrowser.com/網站可以讓服務器測試瀏覽器的屬性。用下面的代碼來采集這個網站的信息，驗證我們瀏覽器的cookie設置：

import requests
from bs4 import BeautifulSoup

session = requests.Session()
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"}
url="https://www.whatismybrowser.com/developers/what-http-headers-is-my-browser-sending"
req=session.get(url, headers=headers)

bsObj = BeautifulSoup(req.text, "lxml")
print(bsObj.find("table",{"class":"table-striped"}).get_text)

和Ryan給的代碼稍有點不一樣，加上了BeautifulSoup要用lxml解析，可能是由於我的header和Ryan不一樣。

通常真正重要的參數就是User-Agent。如果在處理一個警覺性非常高的網站，就要註意那些經常用卻很少檢查的請求頭。

請求頭還可以讓網站改變內容的布局樣式。例如，用移動設備瀏覽網站時，通常會看到一個沒有廣告、Flash以及其他幹擾的簡化的網站版本。

Ryan給了一個移動設備的User-Agent如下。

User-Agent:Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, lke Gecko) Version/7.0 Mobile/11D257 Safari/95
37.53

【參考】

[1]《Python網絡數據采集》Ryan Mitchell

【Python學習】request庫

.html pri less tps python-r mac os part 絕對路徑 pytho Requests庫(https://www.python-requests.org/)是一個擅長處理那些復雜的HTTP請求、cookie、header（響應頭和請求頭）等內

【Python學習】request庫

提交一個最基本的表單

提交文件和圖像

處理登錄與cookie

修改請求頭

【Python學習】request庫

【Python學習】csv庫

【Python學習】第三方庫安裝後仍然import失敗，提示ModuleNotFoundError:No module named'XXX'

【Python學習】之 Turtle庫

【Python-ML】SKlearn庫整合學習器Bagging

【Python學習】Python解決漢諾塔問題

【python學習】使用python寫一個2048小遊戲

【Python學習】Python中的數據類型精度問題

【python學習】今天看看學習 %d ,%s, %f 等用法，下面的學習例子是說輸入名字、年齡、工作，工資。並給出65歲退休還差多久的計算

【Python學習】列表

【Python爬蟲】Requests庫的安裝

【Python學習】解決pandas中打印DataFrame行列顯示不全的問題

【Python學習】程序運行完發送郵件提醒

【Python學習】使用Pyinstaller將py檔案匯出為exe檔案

【python 爬蟲】BeautifulSoup4 庫的介紹使用

【Python學習】字元編碼

【Python學習】安裝與配置方法

【python學習】生成200個兌換碼

【Python學習】Jupyter Lab目錄外掛安裝

【python學習】2-python基礎知識

【Python學習】request庫

提交一個最基本的表單

提交文件和圖像

處理登錄與cookie

修改請求頭

相關推薦