1. 程式人生 > >Python爬蟲:Scrapy的get請求和post請求

Python爬蟲:Scrapy的get請求和post請求

scrapy 請求繼承體系

Request
	|-- FormRequest

通過以下請求測試
GET: https://httpbin.org/get
POST: https://httpbin.org/post

get請求

方式:通過Request 傳送


import json

from scrapy import Spider, Request, cmdline


class SpiderRequest(Spider):
    name = "spider_request"

    def start_requests(self):
        url =
"https://httpbin.org/get?name=tom" yield Request(url, body=json.dumps({"age": "23"})) def parse(self, response): print(response.text) if __name__ == '__main__': cmdline.execute("scrapy crawl spider_request".split())

服務端收到url連結中的引數name,而沒有收到body裡邊的引數age

"args": {
    "name"
: "tom" },

post請求

方式一:通過FormRequest 傳送

from scrapy import Spider, cmdline, FormRequest


class SpiderFormData(Spider):
    name = "spider_form_data"

    def start_requests(self):
        url = "https://httpbin.org/post"
        yield FormRequest(url, formdata={"name": "Tom"})

    def parse(self,
response): print(response.text) if __name__ == '__main__': cmdline.execute("scrapy crawl spider_form_data".split())

伺服器接收到引數

"form": {
    "name": "Tom"
  }, 

而且headers裡邊有一個引數

 "headers": {
    "Content-Type": "application/x-www-form-urlencoded", 
  }, 

方式二:通過Request傳送

需要新增引數 method="POST"

import json

from scrapy import Spider, Request, cmdline


class SpiderPost(Spider):
    name = "spider_post"

    def start_requests(self):
        url = "https://httpbin.org/post"
        yield Request(url, method="POST", body=json.dumps({"name": "Tom"}))

    def parse(self, response):
        print(response.text)


if __name__ == '__main__':
    cmdline.execute("scrapy crawl spider_post".split())

1、直接傳送post請求,伺服器端收到引數data,和json:

"data": "{\"name\": \"Tom\"}", 
"form": {}, 
"json": {
    "name": "Tom"
  }, 

2、如果新增headers引數:

 "headers": {
    "Content-Type": "application/x-www-form-urlencoded", 
  }, 

伺服器收到引數,form將接收到引數,也就是FormRequest的提交方式

"data": "", 
"form": {
    "{\"name\": \"Tom\"}": ""
  }, 
"json": null,

3、如果新增headers引數:

 "headers": {
    "Content-Type": "application/json", 
  }, 

伺服器端將收到data 和json 引數,和第一個情形一樣,不過有時候不加這個請求頭引數獲取,會請求錯誤

"data": "{\"name\": \"Tom\"}", 
"form": {}, 
"json": {
    "name": "Tom"
  }, 

總結

請求方式 使用方法 headers引數 引數 伺服器端接收到引數
get Request - ?name=tom args
post FormRequest 有預設值 formdata={“name”: “Tom”} form
post Request - body=json.dumps({“name”: “Tom”}) data,json
post Request “Content-Type”: “application/x-www-form-urlencoded” body=json.dumps({“name”: “Tom”}) form
post Request “Content-Type”: “application/json”, body=json.dumps({“name”: “Tom”}) data, json

參考
Scrapy Requests and Responses