1. 程式人生 > >python中requests的用法總結

python中requests的用法總結

requests是一個很實用的Python HTTP客戶端庫,編寫爬蟲和測試伺服器響應資料時經常會用到。可以說,Requests 完全滿足如今網路的需求

本文全部來源於官方文件 http://docs.python-requests.org/en/master/

安裝方式一般採用$ pip install requests。其它安裝方式參考官方文件

 

HTTP - requests

 

import requests

 

GET請求

 

r  = requests.get('http://httpbin.org/get')

 

傳參

>>> payload = {'key1': 'value1', 'key2': 'value2', 'key3': None}
>>> r = requests.get('http://httpbin.org/get', params=payload)

 

http://httpbin.org/get?key2=value2&key1=value1

 

Note that any dictionary key whose value is None will not be added to the URL's query string.

 

引數也可以傳遞列表

 

>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}

>>> r = requests.get('http://httpbin.org/get', params=payload)
>>> print(r.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3

r.text 返回headers中的編碼解析的結果,可以通過r.encoding = 'gbk'

來變更解碼方式

r.content返回二進位制結果

r.json()返回JSON格式,可能丟擲異常

r.status_code

r.raw返回原始socket respons,需要加引數stream=True

 

>>> r = requests.get('https://api.github.com/events', stream=True)

>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>

>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

將結果儲存到檔案,利用r.iter_content()

 

with open(filename, 'wb') as fd:
   
for chunk in r.iter_content(chunk_size):
       
fd.write(chunk)

 

傳遞headers

 

>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)

 

傳遞cookies

 

>>> url = 'http://httpbin.org/cookies'

>>> r = requests.get(url, cookies=dict(cookies_are='working'))
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

 

 

POST請求

 

傳遞表單

r = requests.post('http://httpbin.org/post', data = {'key':'value'})

 

通常,你想要傳送一些編碼為表單形式的資料—非常像一個HTML表單。 要實現這個,只需簡單地傳遞一個字典給 data 引數。你的資料字典 在發出請求時會自動編碼為表單形式:

 

 

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

很多時候你想要傳送的資料並非編碼為表單形式的。如果你傳遞一個 string 而不是一個dict ,那麼資料會被直接釋出出去。

 

>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

 

>>> r = requests.post(url, data=json.dumps(payload))

或者

>>> r = requests.post(url, json=payload)

 

 

傳遞檔案

 

url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}

>>> r = requests.post(url, files=files)

配置filesfilename, content_type and headers

files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

 

files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}

 

響應

 

r.status_code

r.heards

r.cookies

 

 

跳轉

 

By default Requests will perform location redirection for all verbs except HEAD.

 

>>> r = requests.get('http://httpbin.org/cookies/set?k2=v2&k1=v1')

>>> r.url
'http://httpbin.org/cookies'

>>> r.status_code
200

>>> r.history
[<Response [302]>]

 

If you're using HEAD, you can enable redirection as well:

 

r=requests.head('http://httpbin.org/cookies/set?k2=v2&k1=v1',allow_redirects=True)

 

You can tell Requests to stop waiting for a response after a given number of seconds with the timeoutparameter:

 

requests.get('http://github.com', timeout=0.001)

 

 

高階特性

 

來自 <http://docs.python-requests.org/en/master/user/advanced/#advanced>

 

session,自動儲存cookies,可以設定請求引數,下次請求自動帶上請求引數

 

s = requests.Session()

s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')

print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'

session可以用來提供預設資料,函式引數級別的資料會和session級別的資料合併,如果key重複,函式引數級別的資料將覆蓋session級別的資料。如果想取消session的某個引數,可以在傳遞一個相同keyvalueNonedict

 

s = requests.Session()
s.auth = ('user', 'pass') #許可權認證
s.headers.update({'x-test': 'true'})

# both 'x-test' and 'x-test2' are sent
s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})

函式引數中的資料只會使用一次,並不會儲存到session

 

如:cookies僅本次有效

r = s.get('http://httpbin.org/cookies', cookies={'from-my': 'browser'})

 

session也可以自動關閉

 

with requests.Session() as s:
   
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')

 

響應結果不僅包含響應的全部資訊,也包含請求資訊

 

r = requests.get('http://en.wikipedia.org/wiki/Monty_Python')

r.headers

r.request.headers

 

 

SSL證書驗證

 

 

Requests可以為HTTPS請求驗證SSL證書,就像web瀏覽器一樣。要想檢查某個主機的SSL證書,你可以使用 verify 引數:

 

 

>>> requests.get('https://kennethreitz.com', verify=True)
requests.exceptions.SSLError: hostname 'kennethreitz.com' doesn't match either of '*.herokuapp.com', 'herokuapp.com'

在該域名上我沒有設定SSL,所以失敗了。但Github設定了SSL:

>>> requests.get('https://github.com', verify=True)
<Response [200]>

對於私有證書,你也可以傳遞一個CA_BUNDLE檔案的路徑給 verify 。你也可以設定REQUEST_CA_BUNDLE 環境變數。

 

>>> requests.get('https://github.com', verify='/path/to/certfile')

 

如果你將 verify 設定為FalseRequests也能忽略對SSL證書的驗證。

 

>>> requests.get('https://kennethreitz.com', verify=False)
<Response [200]>

預設情況下, verify 是設定為True的。選項 verify 僅應用於主機證書。

你也可以指定一個本地證書用作客戶端證書,可以是單個檔案(包含金鑰和證書)或一個包含兩個檔案路徑的元組:

 

>>> requests.get('https://kennethreitz.com', cert=('/path/server.crt', '/path/key'))
<Response [200]>

響應體內容工作流

 

預設情況下,當你進行網路請求後,響應體會立即被下載。你可以通過 stream 引數覆蓋這個行為,推遲下載響應體直到訪問 Response.content 屬性:

 

tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
r = requests.get(tarball_url, stream=True)

此時僅有響應頭被下載下來了,連線保持開啟狀態,因此允許我們根據條件獲取內容:

 

if int(r.headers['content-length']) < TOO_LONG:
 
content = r.content
 
...

如果設定streamTrue,請求連線不會被關閉,除非讀取所有資料或者呼叫Response.close

 

可以使用contextlib.closing來自動關閉連線:

 

 

import requests

from contextlib

import closing

tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'

file = r'D:\Documents\WorkSpace\Python\Test\Python34Test\test.tar.gz'

 

with closing(requests.get(tarball_url, stream=True)) as r:

with open(file, 'wb') as f:

for data in r.iter_content(1024):

f.write(data)

 

Keep-Alive

 

來自 <http://docs.python-requests.org/en/master/user/advanced/>

 

同一會話內你發出的任何請求都會自動複用恰當的連線!

注意:只有所有的響應體資料被讀取完畢連線才會被釋放為連線池;所以確保將 stream設定為 False 或讀取 Response 物件的 content 屬性。

 

流式上傳

Requests支援流式上傳,這允許你傳送大的資料流或檔案而無需先把它們讀入記憶體。要使用流式上傳,僅需為你的請求體提供一個類檔案物件即可:

讀取檔案請使用位元組的方式,這樣Requests會生成正確的Content-Length

with open('massive-body', 'rb') as f:
   
requests.post('http://some.url/streamed', data=f)

 

分塊傳輸編碼

 

對於出去和進來的請求,Requests也支援分塊傳輸編碼。要傳送一個塊編碼的請求,僅需為你的請求體提供一個生成器

注意生成器輸出應該為bytes

def gen():
   
yield b'hi'
   
yield b'there'

requests.post('http://some.url/chunked', data=gen())

For chunked encoded responses, it's best to iterate over the data using Response.iter_content(). In an ideal situation you'll have set stream=True on the request, in which case you can iterate chunk-by-chunk by calling iter_content with a chunk size parameter of None. If you want to set a maximum size of the chunk, you can set a chunk size parameter to any integer.

POST Multiple Multipart-Encoded Files

 

來自 <http://docs.python-requests.org/en/master/user/advanced/>

 

<input type="file" name="images" multiple="true" required="true"/>

 

To do that, just set files to a list of tuples of (form_field_name, file_info):

 

>>> url = 'http://httpbin.org/post'
>>> multiple_files = [
        ('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),
        ('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
>>> r = requests.post(url, files=multiple_files)
>>> r.text
{
  ...
  'files': {'images': 'data:image/png;base64,iVBORw ....'}
  'Content-Type': 'multipart/form-data; boundary=3131623adb2043caaeb5538cc7aa0b3a',
  ...
}

Custom Authentication

Requests allows you to use specify your own authentication mechanism.

Any callable which is passed as the auth argument to a request method will have the opportunity to modify the request before it is dispatched.

Authentication implementations are subclasses of requests.auth.AuthBase, and are easy to define. Requests provides two common authentication scheme implementations in requests.auth:HTTPBasicAuth and HTTPDigestAuth.

Let's pretend that we have a web service that will only respond if the X-Pizza header is set to a password value. Unlikely, but just go with it.

from requests.auth import AuthBase

class PizzaAuth(AuthBase):
   
"""Attaches HTTP Pizza Authentication to the given Request object."""
   
def __init__(self, username):
       
# setup any auth-related data here
       
self.username = username

def __call__(self, r):
       
# modify and return the request
       
r.headers['X-Pizza'] = self.username
       
return r

Then, we can make a request using our Pizza Auth:

>>> requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth'))
<Response [200]>

 

來自 <http://docs.python-requests.org/en/master/user/advanced/>

 

流式請求

 

r = requests.get('http://httpbin.org/stream/20', stream=True)

for line in r.iter_lines():

 

代理

 

If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:

import requests

proxies = {
 
'http': 'http://10.10.1.10:3128',
 
'https': 'http://10.10.1.10:1080',
}

requests.get('http://example.org', proxies=proxies)

 

To use HTTP Basic Auth with your proxy, use the http://user:[email protected]/ syntax:

proxies = {'http': 'http://user:[email protected]:3128/'}

 

超時

 

 

If you specify a single value for the timeout, like this:

 

r = requests.get('https://github.com', timeout=5)

 

The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:

 

r = requests.get('https://github.com', timeout=(3.05, 27))

 

If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.

 

r = requests.get('https://github.com', timeout=None)

 

來自 <http://docs.python-requests.org/en/master/user/advanced/>

更多見:https://www.cnblogs.com/lilinwei340/p/6417689.html

requests是一個很實用的Python HTTP客戶端庫,編寫爬蟲和測試伺服器響應資料時經常會用到。可以說,Requests 完全滿足如今網路的需求

本文全部來源於官方文件 http://docs.python-requests.org/en/master/

安裝方式一般採用$ pip install requests。其它安裝方式參考官方文件

 

HTTP - requests

 

import requests

 

GET請求

 

r  = requests.get('http://httpbin.org/get')

 

傳參

>>> payload = {'key1': 'value1', 'key2': 'value2', 'key3': None}
>>> r = requests.get('http://httpbin.org/get', params=payload)

 

http://httpbin.org/get?key2=value2&key1=value1

 

Note that any dictionary key whose value is None will not be added to the URL's query string.

 

引數也可以傳遞列表

 

>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}

>>> r = requests.get('http://httpbin.org/get', params=payload)
>>> print(r.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3

r.text 返回headers中的編碼解析的結果,可以通過r.encoding = 'gbk'來變更解碼方式

r.content返回二進位制結果

r.json()返回JSON格式,可能丟擲異常

r.status_code

r.raw返回原始socket respons,需要加引數stream=True

 

>>> r = requests.get('https://api.github.com/events', stream=True)

>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>

>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

將結果儲存到檔案,利用r.iter_content()

 

with open(filename, 'wb') as fd:
   
for chunk in r.iter_content(chunk_size):
       
fd.write(chunk)

 

傳遞headers

 

>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)

 

傳遞cookies

 

>>> url = 'http://httpbin.org/cookies'

>>> r = requests.get(url, cookies=dict(cookies_are='working'))
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

 

 

POST請求

 

傳遞表單

r = requests.post('http://httpbin.org/post', data = {'key':'value'})

 

通常,你想要傳送一些編碼為表單形式的資料—非常像一個HTML表單。 要實現這個,只需簡單地傳遞一個字典給 data 引數。你的資料字典 在發出請求時會自動編碼為表單形式:

 

 

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

很多時候你想要傳送的資料並非編碼為表單形式的。如果你傳遞一個 string 而不是一個dict ,那麼資料會被直接釋出出去。

 

>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

 

>>> r = requests.post(url, data=json.dumps(payload))

或者

>>> r = requests.post(url, json=payload)

 

 

傳遞檔案

 

url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}

>>> r = requests.post(url, files=files)

配置filesfilename, content_type and headers

files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}