python爬蟲之requests模塊

阿新 • • 發佈：2017-09-01

.post 過大 form表單提交 www xxxxxx psd method date .com

一. 登錄事例

a. 查找汽車之家新聞標題鏈接圖片寫入本地

import requests
from bs4 import BeautifulSoup
import uuid

response = requests.get(
    ‘http://www.autohome.com.cn/news/‘
)
response.encoding = ‘gbk‘
soup = BeautifulSoup(response.text,‘html.parser‘)        # HTML會轉換成對象
tag = soup.find(id=‘auto-channel-lazyload-article 
‘)
li_list = tag.find_all(‘li‘)

for i in li_list:
    a = i.find(‘a‘)
    if a:
        print(a.attrs.get(‘href‘))
        txt = a.find(‘h3‘).text
        print(txt)
        img_url = txt = a.find(‘img‘).attrs.get(‘src‘)
        print(img_url)


        img_response = requests.get(url=img_url)
        file_name  
= str(uuid.uuid4()) + ‘.jpg‘
        with open(file_name,‘wb‘) as f:
            f.write(img_response.content)

用到BeautifulSoup模塊尋找標簽

b. 抽屜點贊獲取頁面和登錄都會獲取gpsd 點贊會使用獲取頁面的gpsd 而不是登錄的gpsd

import requests

#先獲取頁面

r1 = requests.get(‘http://dig.chouti.com/‘)
r1_cookies = r1.cookies.get_dict()

#登錄
post_dict  
= {
    "phone":"8615131255089",
    "password":"woshiniba",
    "oneMonth":"1"
}

r2 = requests.post(
    url="http://dig.chouti.com/login",
    data = post_dict,
    cookies=r1_cookies
)

r2_cookies = r2.cookies.get_dict()

# 訪問其他頁面
r3 = requests.post(
    url="http://dig.chouti.com/link/vote?linksId=13921091",
    cookies={‘gpsd‘:r1_cookies[‘gpsd‘]}
)
print(r3.text)

抽屜網頁面的(gpsd)

c. 登錄githup 攜帶cookie登錄

import requests
from bs4 import BeautifulSoup

r1 = requests.get(‘https://github.com/login‘)
s1 = BeautifulSoup(r1.text,‘html.parser‘)

# 獲取csrf_token
token = s1.find(name=‘input‘,attrs={‘name‘:"authenticity_token"}).get(‘value‘)
r1_cookie_dict = r1.cookies.get_dict()

# 將用戶名 密碼 token 發送到服務端 post
r2 = requests.post(
    ‘https://github.com/session‘,
    data={
    ‘commit‘:‘Sign in‘,
    ‘utf8‘:‘?‘,
    ‘authenticity_token‘:token,
    ‘login‘:‘[email protected]‘,
    ‘password‘:‘alex3714‘
    },
    cookies=r1_cookie_dict
)

#  獲取登錄後cookie
r2_cookie_dict = r2.cookies.get_dict()

#合並登錄前的cookie和登錄後的cookie
cookie_dict = {}
cookie_dict.update(r1_cookie_dict)
cookie_dict.update(r2_cookie_dict)




r3 = requests.get(
    url=‘https://github.com/settings/emails‘,
    cookies=cookie_dict
)

print(r3.text)

View Code

二. requests 參數

- method:  提交方式
            - url:     提交地址
            - params:  在URL中傳遞的參數,GET
            - data:    在請求體裏傳遞的數據
            - json     在請求體裏傳遞的數據
            - headers  請求頭
            - cookies  Cookies
            - files    上傳文件
            - auth     基本認知(headers中加入加密的用戶名和密碼)
            - timeout  請求和響應的超市時間
            - allow_redirects  是否允許重定向
            - proxies  代理
            - verify   是否忽略證書
            - cert     證書文件
            - stream   村長下大片
            - session: 用於保存客戶端歷史訪問信息

a. file 發送文件

import requests

requests.post(
    url=‘xxx‘,
    filter={
        ‘name1‘: open(‘a.txt‘,‘rb‘),   #名稱對應的文件對象
        ‘name2‘: (‘bbb.txt‘,open(‘b.txt‘,‘rb‘))     #表示上傳到服務端的名稱為 bbb.txt
    }
)

View Code

b. auth 認證

#配置路由器訪問192.168.0.1會彈出小彈窗,輸入用戶名,密碼 點擊登錄不是form表單提交,是基本登錄框，這種框會把輸入的用戶名和密碼 經過加密放在請求頭發送過去

import requests

requests.post(
    url=‘xxx‘,
    filter={
        ‘name1‘: open(‘a.txt‘,‘rb‘),   #名稱對應的文件對象
        ‘name2‘: (‘bbb.txt‘,open(‘b.txt‘,‘rb‘))     #表示上傳到服務端的名稱為 bbb.txt
    }
)

View Code

c. stream 流

#如果服務器文件過大,循環下載

def param_stream():
    ret = requests.get(‘http://127.0.0.1:8000/test/‘, stream=True)
    print(ret.content)
    ret.close()

    # from contextlib import closing
    # with closing(requests.get(‘http://httpbin.org/get‘, stream=True)) as r:
    # # 在此處理響應。
    # for i in r.iter_content():
    # print(i)

View Code

d. session 和django不同事例：簡化抽屜點贊

    import requests

    session = requests.Session()

    ### 1、首先登陸任何頁面，獲取cookie

    i1 = session.get(url="http://dig.chouti.com/help/service")

    ### 2、用戶登陸，攜帶上一次的cookie，後臺對cookie中的 gpsd 進行授權
    i2 = session.post(
        url="http://dig.chouti.com/login",
        data={
            ‘phone‘: "8615131255089",
            ‘password‘: "xxxxxx",
            ‘oneMonth‘: ""
        }
    )

    i3 = session.post(
        url="http://dig.chouti.com/link/vote?linksId=8589623",
    )
    print(i3.text)

View Code

python爬蟲之requests模塊

.post 過大 form表單提交 www xxxxxx psd method date .com 一. 登錄事例 a. 查找汽車之家新聞標題鏈接圖片寫入本地 import requests from bs4 import BeautifulSoup import

Python爬蟲之requests模塊(2)

env odi 發送名稱相關防止 tip htm useragent 一.今日內容 session處理cookie proxies參數設置請求代理ip 基於線程池的數據爬取二.回顧 xpath的解析流程 bs4的解析流程常用xpath表達

Python爬蟲之requests模塊(1)

字典 win64 login 綜合 NPU apply 如果 .... email 一.引入 Requests 唯一的一個非轉基因的 Python HTTP 庫，人類可以安全享用。警告：非專業使用其他 HTTP 庫會導致危險的副作用，包括：安全缺陷癥、冗余代碼癥、重新

python網絡爬蟲之requests模塊

基於 req 模塊模擬網絡爬蟲用法 bsp 流程發送什麽是requests模塊: 　　requests模塊是python中原生的基於網路請求的模塊,其主要作用是用來模擬瀏覽器發送請求,功能強大,用法簡潔高效,在爬蟲的領域占半壁江山如何使用requests模塊

python3 爬蟲之requests模塊使用總結

swd rom 一個 http 寫入 delet pen req 狀態碼 Requests 是第三方模塊，如果要使用的話需要導入。Requests也可以說是urllib模塊的升級版，使用上更方便。這是使用urllib的例子。 import urllib.request

爬蟲之requests模塊

ram 格式 win json數據默認安裝工作 pass gen pro 引入在學習爬蟲之前可以先大致的了解一下HTTP協議~ HTTP協議：https://www.cnblogs.com/peng104/p/9846613.html 爬蟲的基本流程簡介

python爬蟲值requests模塊

持久化存儲 resp 頁面 cat kit ESS mail set interval - 基於如下5點展開requests模塊的學習什麽是requests模塊 requests模塊是python中原生的基於網絡請求的模塊，其主要作用是用來模擬瀏覽器發起請求。

網絡爬蟲之requests模塊

use fcc manage 關鍵字 person .json size 詳情 param 一 . requests模塊的學習什麽是requests模塊 ? requests模塊是python中原生的基於網絡請求的模塊，其主要作用是用來模擬瀏覽器發起請求

python之requests模塊中的params和data的區別

技術分享 bubuko .com param ireader rec 之間 clas data params的時候之間接把參數加到url後面，只在get請求時使用： 1 import requests 2 url=‘https://api.ireaderm.net/a

python之requests模塊

重新啟動 status text bsp requests maven script nuget pass Requests 唯一的一個非轉基因的 Python HTTP 庫，人類可以安全享用（http://cn.python-requests.org/zh_CN/late

python筆記之psutil模塊

pan all ins python import print spa install .cn 收集教程 http://www.cnblogs.com/xiao1/p/6164204.html 實戰教程安裝psutil模塊 pip2 install psuti

Python基礎之常用模塊（三）

section signal server .section 通過 sub 實例 wait 配置文件 1.configparser模塊該模塊是用來對文件進行讀寫操作，適用於格式與Windows ini 文件類似的文件，可以包含一個或多個節（section），每個節可以有多

Python實戰之SocketServer模塊

utf8 mixin 程序通過框架 obj 基本使用取數據 rgs 文章出處：http://www.cnblogs.com/wupeiqi/articles/5040823.html SocketServer內部使用 IO多路復用以及 “多線程” 和 “多進程”

python ssh之paramiko模塊使用

begin mman strip() 執行命令 shc 顯示錯誤 stdout pac toad 1.安裝: sudo pip install paramiko 2.連接到linux服務器方法一: #paramiko.util.log_to_file(‘ssh.lo

python學習之sys模塊

version class pytho ont python span 模塊 inf color 查看python的版本 >>> sys.version_info[0] 3 python學習之sys模塊

python學習之io模塊

pan font nbsp get mic color blog 內存 import class io.BytesIO([initial_bytes]) 他是一個_io.BytesIO對象。用這個類的實例可以操作內存緩沖區中的字節流。 >>> s

python學習之argparse模塊

set argument file print red parser test handle rgs 一、簡介： argparse是python用於解析命令行參數和選項的標準模塊，用於代替已經過時的optparse模塊。argparse模塊的作用是用於解析命令行參數，例如p

python學習之platform模塊

操作系統 roc log 屬性。 gen 處理 mil false font 該模塊用來訪問平臺相關屬性。常見屬性和方法平臺架構 platform.machine() 返回平臺架構。若無法確定，則返回空字符串。 >>> platform.ma

python學習之argparse模塊的使用

字符 16px 需要 var desc 步驟 rec des int 以下內容主要來自：http://wiki.jikexueyuan.com/project/explore-python/Standard-Modules/argparse.html argparse 使用

爬蟲之request模塊

nsh ## 網頁基於 mark chrome 汽車 int tex 爬蟲之request模塊 request簡介 #介紹：使用requests可以模擬瀏覽器的請求，比起之前用到的urllib，requests模塊的api更加便捷（本質就是封裝了urllib3） #註意

python爬蟲之requests模塊

一. 登錄事例

二. requests 參數

相關推薦