【Python3 爬蟲學習筆記】基本庫的使用 5 —— 使用requests

阿新 • • 發佈：2018-12-10

二、使用requests

1. 基本用法

1.1 例項引入

urllib庫中的urlopen()方法實際上是以GET方式請求網頁，而requests中相應的方法就是get()方法。

import requests

r = requests.get('https://www.baidu.com/')
print(type(r))
print(r.status_code)
print(type(r.text))
print(r.text)
print(r.cookies)

執行結果：這裡寫圖片描述

這裡呼叫get()方法實現與urlopen()相同的操作，得到一個Response物件，然後分別輸出了Response的型別、狀態碼、響應體的型別、內容以及Cookies。通過執行結果可以發現，它的返回型別是requests.models.Response，響應體的型別是字串str，Cookies的型別是RequestsCookieJar。

1.2 GET請求

import requests

r = requests.get('http://httpbin.org/get')
print(r.text)

執行結果如下：

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "close",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.18.4"
  },
  "origin" 
: "221.216.169.171",
  "url": "http://httpbin.org/get"
}

可以發現，我們成功發起了GET請求，返回結果中包含請求頭、URL、IP等資訊。對於GET請求，利用params這個引數，可以附加額外的資訊。

import requests

data = {
    'name':'germey',
    'age':22
}
r = requests.get("http://httpbin.org/get",params=data)
print(r.text)

這裡寫圖片描述

另外，網頁的返回型別實際上是str型別，但是它很特殊，是JSON格式的。所以，如果想直接解析返回結果，得到一個字典格式的話，可以直接呼叫json()方法。示例如下：

import requests

r = requests.get("http://httpbin.org/get")
print(type(r.text))
print(r.json())
print(type(r.json()))

執行結果如下：這裡寫圖片描述

可以發現，呼叫json()方法，就可以將返回結果是JSON格式的字串轉化為字典。如果返回不是JSON格式，便會出現解析錯誤，丟擲json.decoder.JSONDecodeError異常。

抓取網頁

import requests
import re

headers = {
    'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
}
r = requests.get("https://www.zhihu.com/explore", headers=headers)
pattern = re.compile('explore-feed.*?question_link.*?>(.*?)</a>', re.S)
titles = re.findall(pattern, r.text)
print(titles)

這裡加入了headers資訊，其中包含了User-Agent欄位資訊，也就是瀏覽器標識資訊。如果不加Headers，知乎會禁止抓取。接下來用到了最基礎的正則表示式來匹配出所有的問題內容。執行結果如下：這裡寫圖片描述

【Python3 爬蟲學習筆記】基本庫的使用 5 —— 使用requests

二、使用requests

1. 基本用法

1.1 例項引入

1.2 GET請求

【Python3 爬蟲學習筆記】基本庫的使用 8—— 正則表示式 1

【Python3 爬蟲學習筆記】基本庫的使用 7 —— 使用requests

【Python3 爬蟲學習筆記】基本庫的使用 12—— 正則表示式 5

【Python3 爬蟲學習筆記】基本庫的使用 11—— 正則表示式 4

【Python3 爬蟲學習筆記】基本庫的使用 10—— 正則表示式 3

【Python3 爬蟲學習筆記】基本庫的使用 9—— 正則表示式 2

【Python3 爬蟲學習筆記】基本庫的使用 13 —— 抓取貓眼電影排行

【Python3 爬蟲學習筆記】基本庫的使用 1

【Python3 爬蟲學習筆記】基本庫的使用 2

【Python3 爬蟲學習筆記】基本庫的使用 5 —— 使用requests

【Python3 爬蟲學習筆記】解析庫的使用 3 —— Beautiful Soup 1

【Python3 爬蟲學習筆記】解析庫的使用 2 —— 使用XPath 2

【Python3 爬蟲學習筆記】解析庫的使用 1 —— 使用XPath 1

【Python3 爬蟲學習筆記】解析庫的使用 7 —— Beautiful Soup 5

【Python3 爬蟲學習筆記】解析庫的使用 5 —— Beautiful Soup 3

【Python3 爬蟲學習筆記】解析庫的使用 4 —— Beautiful Soup 2

【Python3 爬蟲學習筆記】解析庫的使用 10 —— 使用pyquery 3

【Python3 爬蟲學習筆記】解析庫的使用 9 —— 使用pyquery 2

【Python3 爬蟲學習筆記】解析庫的使用 8 —— 使用pyquery 1

【Python3 爬蟲學習筆記】解析庫的使用 11 —— 使用pyquery 4

【Python3 爬蟲學習筆記】基本庫的使用 5 —— 使用requests

二、使用requests

1. 基本用法

1.1 例項引入

1.2 GET請求

相關推薦