Python 爬蟲 urllib模組：post方式

阿新 • • 發佈：2018-12-06

本程式以爬取 'http://httpbin.org/post' 為例

格式：

匯入urllib.request

匯入urllib.parse

資料編碼處理，再設為utf-8編碼: bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')

開啟爬取的網頁: response = urllib.request.urlopen('網址', data = data)

讀取網頁程式碼: html = response.read()

列印:

1.不decode

print(html) #爬取的網頁程式碼會不分行，沒有空格顯示，很難看

2.decode

print(html.decode()) #爬取的網頁程式碼會分行，像寫規範的程式碼一樣，看起來很舒服

查詢請求結果：

a. response.status # 返回 200：請求成功 404：網頁找不到，請求失敗

b. response.getcode() # 返回 200：請求成功 404：網頁找不到，請求失敗

1.不decode的程式如下：

import urllib.request
import urllib.parsse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
response = urllib.request.urlopen(' data = data )
html = response.read()

print(html)
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)
print(response.getcode())

執行結果：

2.帶decode的程式如下：

import urllib.request
import urllib.parsse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
response = urllib.request.urlopen(' data = data )
html = response.read()

print(html.decode())
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)
print(response.getcode())

執行結果：

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "word": "hello"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Content-Length": "10", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.4"
  }, 
  "json": null, 
  "origin": "106.14.17.222", 
  "url": "http://httpbin.org/post"
}

------------------------------------------------------------------
------------------------------------------------------------------
200
200

為什麼要用bytes轉換？

因為

data = urllib.parse.urlencode({'word': 'hello'}) ##沒有用bytes
response = urllib.request.urlopen('http://httpbin.org/post', data = data )
html = response.read()

錯誤提示：

Traceback (most recent call last):
  File "/usercode/file.py", line 15, in <module>
    response = urllib.request.urlopen('http://httpbin.org/post', data = data )
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 453, in open
    req = meth(req)
  File "/usr/lib/python3.4/urllib/request.py", line 1104, in do_request_
    raise TypeError(msg)
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

由此可見，post方式需要將請求內容用二進位制編碼。

class bytes([source[, encoding[, errors]]])

Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray– it has the same non-mutating methods and the same indexing and slicing behavior.

Accordingly, constructor arguments are interpreted as for bytearray().

Python 爬蟲 urllib模組：post方式

本程式以爬取 'http://httpbin.org/post' 為例格式：匯入urllib.request 匯入urllib.parse 資料編碼處理，再設為utf-8編碼: bytes(urllib.parse.urlenco

Python 爬蟲 urllib模組：get方式

本程式以爬取百度首頁為例格式：匯入urllib.request 開啟爬取的網頁: response = urllib.request.urlopen('網址') 讀取網頁程式碼: html = response.read() 列印:

[Python爬蟲]requests模組使用post方法提交表單

使用requests庫中的post(url,params)方法,先通過觀察表單的網頁原始碼,或者是通過逆向工程的方法獲取表單提交的欄位,構造引數params,就能實現模擬登入操作. 例如: url =

Python爬蟲學習4：requests.post模擬登入豆瓣（包括獲取驗證碼）

1. 在豆瓣登入網頁嘗試登入後開啟開發者工具，可以查詢後去Headers和Form Data資訊。2. 實現程式碼import requests import html5lib import re from bs4 import BeautifulSoup s = re

Python爬蟲小實踐：尋找失蹤人口，爬取失蹤兒童信息並寫成csv文件，方便存入數據庫

python tor enc mini 執行 gem view 獲取但是前兩天有人私信我，讓我爬這個網站，http://bbs.baobeihuijia.com/forum-191-1.html上的失蹤兒童信息，準備根據失蹤兒童的失蹤時的地理位置來更好的尋找失蹤兒童，這

python爬蟲(七)_urllib2：urlerror和httperror

mat 打開 urllib dfs prot 有用 esp except log urllib2的異常錯誤處理在我們用urlopen或opener.open方法發出一個請求時，如果urlopen或opener.open不能處理這個response，就產生錯誤。這裏主要說

python 爬蟲urllib基礎示例

urllib 爬蟲基礎環境使用python3.5.2 urllib3-1.22 下載安裝wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgztar -zxf Python-3.5.2.tgzcd Python-3.5.2/./

Python爬蟲-urllib的基本用法

quest resp lan roc 用法 rom handler baidu github from urllib import response,request,parse,error from http import cookiejar if __name__

python爬蟲 urllib庫基本使用

afa 識別 urllib spa response aid gen odin pos 以下內容均為python3.6.*代碼學習爬蟲，首先有學會使用urllib庫，這個庫可以方便的使我們解析網頁的內容，本篇講一下它的基本用法解析網頁 #導入urllib from u

Python爬蟲系列 - 初探：爬取旅遊評論

blank .text http fir win64 ati coo get stat Python爬蟲目前是基於requests包，下面是該包的文檔，查一些資料還是比較方便。 http://docs.python-requests.org/en/master/ 爬取某旅遊

python爬蟲urllib庫使用

urllib包括以下四個模組：　　1.request:基本的HTTP請求模組，可以用來模擬傳送請求。就像在瀏覽器裡輸入網址然後回車一樣，只需要給庫方法傳入URL以及額外的引數，就可以模擬實現這個過程。　　2.error：異常處理模組　　3.parse：提供了許多URL處理方法，如拆分、解析、合併等

Python爬蟲系列 - 初探：爬取新聞推送

http nec apple 下標 for pri Language span round Get發送內容格式 Get方式主要需要發送headers、url、cookies、params等部分的內容。 t = requests.get(url, headers = hea

python 學習彙總38：程式設計方式彙總（命令式，宣告式，函式式）（ tcy）

程式設計方式 2018/11/16 ========================================================================= 1.分類指令式程式設計宣告式程式設計函數語言程式設計 =

1.0 -Python爬蟲-Urllib/Requests

0 爬蟲準備工作參考資料 python網路資料採集，圖靈工業出版精通Python爬蟲框架Scrapy，人民郵電出版社 Python3網路爬蟲 Scrapy官方教程前提知識 url http協議 web前端，h

爬蟲--urllib模組

一.urllib庫　　概念：urllib是Python自帶的一個用於爬蟲的庫，其主要作用就是可以通過程式碼模擬瀏覽器傳送請求。其常被用到的子模組在Python3中的為urllib.request和urllib.parse，在Python2中是urllib和urllib2。 &

Python爬蟲實例：爬取B站《工作細胞》短評——異步加載信息的爬取

localtime pre global web for short sco 網頁解析 save 《工作細胞》最近比較火，bilibili 上目前的短評已經有17000多條。先看分析下頁面右邊 li 標簽中的就是短評信息，一共20條。一般我們加載大量數據的時候，都

Python爬蟲--urllib

urllib包含模組： -urllib.request：開啟和讀取urls -urllib.error：包含urllib.request產生的常見的錯誤，使用try捕捉

python爬蟲入門(二)：Requests的使用

雖然Python有內建的urllib庫，可以實現網路的請求，但是我並不推薦。因為urllib在很多時候使用起來不方便，比如加一個代理，處理Cookie時API都很繁瑣，再比如傳送一個POST請求也很麻煩。而Requests就相當於urllib的升級版本，簡

python爬蟲urllib庫詳解

什麼是Urllib Urllib是python內建的HTTP請求庫，中文文件如下：https://docs.python.org/3/library/urllib.html包括以下模組urllib.request 請求模組urllib.error 異常處理模組urllib.parse url解析模組urll

python爬蟲request模組詳解

requests模組使用requests可以模擬瀏覽器的請求，比起之前用到的urllib，requests模組的api更加便捷（本質就是封裝了urllib3）注意：requests庫傳送請求將網頁內容下載下來以後，並不會執行js程式碼，這需要我們自己分析目標站點然後發起新的request請求官方文

Python 爬蟲 urllib模組：post方式

相關推薦