python自帶的urllib使用

阿新 • • 發佈：2018-12-23

1.urllib中request構建完整請求

"""request構建完整請求"""
from urllib import request


# Request封裝url
req = request.Request("https://python.org")
# 發起請求並儲存請求結果
res = request.urlopen(req)
# 列印響應資訊
print(res.read().decode("utf-8"))


"""
class Request:

    def __init__(self, url, data=None, headers={},
                 origin_req_host=None, unverifiable=False,
                 method=None):
    引數解析:
    url:請求URL
    data:跟urlopen裡面的data傳遞一樣的bytes型別資料
    headers:請求頭可直接構造，也可以使用類方法add_header()傳遞引數
    origin_req_host:請求時的host名稱或者IP
    unverifiable:許可權操作，有或者沒有。預設False，表示使用者沒有許可權選擇接受這個請求的結果
    method:請求時的方法，比如GET,POST,DELETE等
 
"""




from urllib import request, parse


# 設定請求的url
url = "http://httpbin.org/post"
# 設定請求頭資訊
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
    "Host": "httpbin.org"
}
dict = {"name": "Germey"}
# 把字典轉換成位元組流資訊
data = bytes(parse.urlencode(dict), encoding=" 
utf8")
# 引數按值傳遞
req = request.Request(url=url, data=data, headers=headers, method="POST")
# 發起請求並儲存請求結果
res = request.urlopen(req)
# 列印響應資訊
print(res.read().decode("utf-8"))

View Code

2.request中urlopen的get請求分析

"""urlopen的get分析"""
from urllib import request
from http.client import HTTPResponse  # 
 引用


res = request.urlopen("https://www.python.org")
print(type(res))        # 列印返回結果的型別，用from引用這個型別檢視具備的方法和屬性
print(res.status)       # 返回相應的狀態碼
print(res.getheaders())     # 返回所有請求頭資訊
print(res.getheader("Server"))      # 返回伺服器資訊，nginx。socket伺服器中比較牛逼的一種

# def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
#             *, cafile=None, capath=None, cadefault=False, context=None):
"""
原始碼解釋：
開啟URL url，可以是字串或Request物件。

    * data *必須是指定要傳送到的其他資料的物件
    伺服器，如果不需要這樣的資料，則為None。請參閱請求
    細節。

    urllib.request模組使用HTTP / 1.1幷包含“Connection：close”
    HTTP請求中的標頭。

    可選的* timeout *引數指定超時（以秒為單位）
    阻塞操作，如連線嘗試（如果未指定，則
    將使用全域性預設超時設定）。這僅適用於HTTP，
    HTTPS和FTP連線。

    如果指定了* context *，則它必須是描述的ssl.SSLContext例項
    各種SSL選項。有關更多詳細資訊，請參閱HTTPSConnection。

    可選的* cafile *和* capath *引數指定一組可信CA.
    HTTPS請求的證書。 cafile應該指向一個檔案
    包含一捆CA證書，而capath應指向a
    雜湊證書檔案的目錄。更多資訊可以在中找到
    ssl.SSLContext.load_verify_locations（）。

    * cadefault *引數被忽略。

    此函式始終返回可用作上下文的物件
    經理和有方法，如

    * geturl（） - 返回檢索到的資源的URL，常用於
      確定是否遵循重定向

    * info（） - 返回頁面的元資訊，例如標題
      email.message_from_string（）例項的形式（請參閱快速參考
      HTTP標頭）

    * getcode（） - 返回響應的HTTP狀態程式碼。引發URLError
      關於錯誤。

    對於HTTP和HTTPS URL，此函式返回http.client.HTTPResponse
    物件略有修改。除了以上三種新方法外，還有
    msg屬性包含與reason屬性相同的資訊---
    伺服器返回的原因短語 - 而不是響應
    在HTTPResponse的文件中指定的標頭。

    對於遺留顯式處理的FTP，檔案和資料URL以及請求
    URLopener和FancyURLopener類，這個函式返回一個
    urllib.response.addinfourl物件。
"""

View Code

3.request中urlopen的post請求分析

"""urlopen的post請求分析"""
from urllib import parse
from urllib import request
import json


# 轉換utf8編碼的data資料
data = bytes(parse.urlencode({"word": "hello"}), encoding="utf8")
# parse.urlencode({"word": "hello"})    返回字串形式'word=hello'
print(data)       # b'word=hello'    返回bytes型別資料與下面json區別
print(type(data))    # <class 'bytes'>
res = request.urlopen("http://httpbin.org/post", data=data)
print(res)          # <http.client.HTTPResponse object at 0x00000184DB1C3E10>   返回響應物件
print(type(res))    # <class 'http.client.HTTPResponse'>   物件型別
print(res.read())   # 讀取返回的內容中b'"form":{"word":"hello"},'此欄位表明模擬了表單提交的方式
arg = json.dumps({"word": "hello"})
print(arg)          # '{"word": "hello"}' json返回字串形式字典資料
print(type(arg))    # <class 'str'>

View Code

4.request中urlopen的異常處理

"""urllib的異常處理"""
from urllib import request, error


try:
    res = request.urlopen("https://home.cnblogs.com/u/Guishuzhe/1")
except error.HTTPError as e:
    # 先捕獲子類詳細異常原因
    print(e.reason, e.code, e.headers)
except error.URLError as e:
    # 再用父類捕獲子類中沒有的異常
    print(e.reason)
else:
    print("Request Successfully")



import socket
from urllib import request
from urllib import error


try:
    # 設定超時時間timeout=0.2
    res = request.urlopen("http://httpbin.org/get", timeout=0.2)
# 捕捉超時異常，返回友好資訊
except error.URLError as e:
    print(type(e.reason))
    # class URLError(OSError):原始碼  self.reason屬性， e.reason呼叫這個屬性
    # 內建函式isinstance判斷錯誤物件是不是某一型別
    # 在這裡是連線超時錯誤socket.timeout
    if isinstance(e.reason, socket.timeout):
        print("超時了")

View Code

5.urllib進階設定Handler

"""urllib進階設定Handler工具"""
from urllib.request import HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, build_opener
from urllib.error import URLError


username = "username"
password = "password"
url = "http://127.0.0.1:8000/"
# 例項化一個待處理物件
p = HTTPPasswordMgrWithDefaultRealm()
# 給例項化物件新增請求引數realm=None等..
p.add_password(None, url, username, password)
# class AbstractBasicAuthHandler:找到父類並例項化出具體請求物件(Handler)
auth_handler = HTTPBasicAuthHandler(p)
# build_opener()方法接受*Handlers任意個Handler物件進行去重等處理，返回Opener物件
opener = build_opener(auth_handler)

try:
    # 開始請求
    res = opener.open(url)
    # 獲取請求結果
    html = res.read().decode("utf8")
    print(html)
except URLError as e:
    # 列印錯誤資訊
    print(e.reason)


"""
HITPDefaultErrorHandler ：用於處理HTTP響應錯誤，錯誤都會丟擲HTTPError型別的異常
HTTPRedirectHandler ：用於處理重定向
HTTPCookieProcessor 用於處理Cookies
ProxyHandler ：用於設定代理預設代理為空
HπPPasswordMgr ：用於管理密碼，它維護了使用者名稱和密碼的表
HTTPBasicAuthHandler 用於管理認證，如果一個連結開啟時需要認證，那麼可以用它來解決認證問題
"""

View Code

6.cookies的處理

"""cookies的處理"""
from http import cookiejar
from urllib import request


# 存放cookie資訊
filename = "cookies.txt"
cookie = cookiejar.LWPCookieJar(filename)    # 建議使用此儲存格式
# cookie = cookiejar.MozillaCookieJar(filename)
handler = request.HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
res = opener.open("http://www.baidu.com")
cookie.save(ignore_discard=True, ignore_expires=True)



# 讀取cookie資訊
cookie = cookiejar.LWPCookieJar()       # 例項化LWP物件
# 指定要讀取的檔案資料到cookie例項，忽略丟棄和忽略過期
cookie.load("cookies.txt", ignore_discard=True, ignore_expires=True)
# 將讀取的cookie資訊封裝為handler型別
handler = request.HTTPCookieProcessor(cookie)
# 建立一個opener物件
opener = request.build_opener(handler)
# 呼叫opener物件的open方法開啟url
res = opener.open("http://www.baidu.com")
print(res.read().decode("utf-8"))

View Code

7.代理設定

"""urllib的代理設定"""
from urllib.error import URLError
from urllib.request import ProxyHandler, build_opener


# 設定代理請求的型別、ip和埠，_parse_proxy函式完成代理引數解析
proxy_handler = ProxyHandler({
    "http": "http://124.231.16.75:9000",
    "https": "https://113.105.201.193:3128"
})
# 封裝設定的代理資料，製造opener物件
opener = build_opener(proxy_handler)
try:
    # 呼叫opener的open方法代理訪問百度
    res = opener.open("https://www.baidu.com")
    print(res.read().decode("utf-8"))
except URLError as e:
    print(e.reason)

View Code

python自帶Urllib庫的使用

Uillib庫python3自帶的上個操作URL的包，功能強大。使用方法，首先載入庫。 ``` import urllib ``` urllib包裡面有四個模組 urllib.request urllib.error urll

python自帶的urllib使用

1.urllib中request構建完整請求 """request構建完整請求""" from urllib import request # Request封裝url req = request.Request("https://python.org") # 發起請求並儲存請求結

python自帶的IDLE如何清屏

鏈接過程 tex alter current tor efi mar and 作者：知乎用戶鏈接：https://www.zhihu.com/question/20917976/answer/32876441 來源：知乎著作權歸作者所有。商業轉載請聯系作者獲得授權，非

Python自帶的hmac模塊

dom __init__ hash pre 代碼 world 使用需要標準 Python自帶的hmac模塊實現了標準的Hmac算法我們首先需要準備待計算的原始消息message，隨機key，哈希算法，這裏采用MD5，使用hmac的代碼如下： import hmac

python自帶的web服務器

color hand 語句 pre 包含 get請求自帶 imp post python自帶的web服務器 python自帶的包可以建立簡單的web服務器 BaseHTTPServer 提供基本的web服務和處理類 SimpleHTTPServer 包含執行get請

day-9 sklearn庫和python自帶庫實現最近鄰KNN算法

-m 寬度 ont 產生 res 長度比較 target 1.2 　　K最近鄰(k-Nearest Neighbor，KNN)分類算法，是一個理論上比較成熟的方法，也是最簡單的機器學習算法之一。該方法的思路是：如果一個樣本在特征空間中的k個最相似(即特征空間中最鄰近)

python 自帶的sum函式與numpy中sum兩者巨大的區別

Python自帶的sum函式與numpy中的sum函式有著天壤之別，沒弄懂之前踩了大坑。 1、Python 自帶的sum Python自帶的sum輸入是個可迭代的。可以是列表，陣列，可迭代物件。此時sum最多有兩個引數第一個引數是可迭代的。當有兩個引數時，第二個引數只能是個數。格式：s

python自帶的程式碼版本轉換指令碼2to3.py和3to2.py

python自帶的工具（指令碼），將python2的程式碼轉換為python3. 同理，python2也有相應的3to2指令碼。將python安裝包下的Tools/Scripts下面的2to3.py拷貝到需要轉換的檔案（test.py）目錄中。命令：python 2to3.py test.p

python自帶執行緒池

1.　　注意：　　導包是：　　　　from multiprocessing.pool import ThreadPool　　#執行緒池不在thrading中 2.　　程式碼：　　from mutiprocessing.pool import ThreadPool 　　def func(*args,

Numpy中sum函式的使用方法（Python自帶sum函式）

Numpy中sum函式（Python自帶sum函式）的作用是對元素求和。無參時，所有全加； axis=0，按列相加； axis=1，按行相加；下邊通過例子來說明其用法： #!/usr/bin/env python # -*- coding:utf-8 -*-

python自帶的排列組合函式

需求：在你的面前有一個n階的臺階，你一步只能上1級或者2級，請計算出你可以採用多少種不同的方法爬完這個樓梯？輸入一個正整數表示這個臺階的級數，輸出一個正整數表示有多少種方法爬完這個樓梯。分析：提煉出題乾的意思：用1和2產生不同組合，使得他們的和等於臺階的級數，輸出有

python 自帶2to3.py 程式碼轉換2-->3

2to3.py #!/usr/bin/env python import sys from lib2to3.main import main sys.exit(main("lib2to3.fixes")) demo.py def greet(name): pr

python 自帶函式 max min的靈活用法，enumerate函式

1. max min函式的靈活用法，主要是對max min 函式中 key的靈活定義： eg. 1: arr = ["abc","abcd","abcde"] 找到 arr中長度最短的字串： arr = ["abc","abcd","abcde"] res = min(ar

python自帶的append方法和extend方法

假設有倆陣列： a = [1, 2, 3] b = [4, 5, 6] a.extend(b)的結果為：[1, 2, 3, 4, 5, 6] a.append(b)的結果為：[1, 2, 3, [4, 5, 6]]

Python自帶佇列模組Queue的使用(3)

PriorityQueue：優先佇列匯入模組 from queue import PriorityQueue 檢視原始碼可以知道PriorityQueue是繼承Queue的，基本的東西就不再贅述了，不清楚可以檢視佇列的使用1和2 優先佇列的使用 from queu

Python自帶日誌模組

預設情況下(logging.basicConfig配置時沒指定filename)，logging將日誌列印到螢幕，日誌級別為WARNING；日誌級別大小關係為：CRITICAL > ERR

Python 自帶簡單模組使用

#coding:utf-8 import urllib.request import sys import re googgle = urllib.request.urlopen("http://www.baidu.com") html = googgle.read()

利用python自帶的os模組刪除windows機器的模組

今天閒來無事就搗鼓了一下python自帶的os模組，看了下大概介紹，通過看視訊就搞了一個實現遞迴刪除圖片的小程式思路分析： 1，實現拼接目錄和檔案 2，在刪除檔案的情況下先做下判斷，否則無法刪除檔案這裡寫程式碼片 import os

Python自帶：python自帶的以字母開頭的函式或方法集合

一、python以字母開頭的函式或方法 A assert斷言用法：防禦性的程式設計、執行時對程式邏輯的檢測、合約性檢查（比如前置條件，後置條件）、程式中的常量、檢查文件 assert 12==12 #assert語句判斷兩個整數是否相等，如果相等就

熟悉快捷鍵，提高開發效率，Python自帶的IDLE常用快捷鍵匯總

cto nag shadow -o 51cto 習慣代碼 .com 不知道 IDLE是Python自帶的coding小工具，對於初學者來說IDLE非常方便實用，但大部分人並不知道它的一些常用快捷鍵。如果能熟練掌握這些常用快捷鍵的話，無疑會使你的代碼編寫效率大大提升。下面列

python自帶的urllib使用

1.urllib中request構建完整請求

2.request中urlopen的get請求分析

3.request中urlopen的post請求分析

4.request中urlopen的異常處理

6.cookies的處理

7.代理設定

相關推薦