瞭解HTTP協議
- HyperText Transfer Protocol超文字傳輸協議
- The Hypertext Transfer Protocol(HTTP) is a stateless(無狀態) application-level protocl for distributed(分散式), collaborative(協作式),hypertext information systems(超文字資訊系統)(referred:wikipedia)
Chrome開發者工具
ctrl+shift+I
curl命令訪問網站
curl -v http://baidu.com > tmp.txt
* Rebuilt URL to: http://baidu.com/ % Total% Received % XferdAverage SpeedTimeTimeTimeCurrent DloadUploadTotalSpentLeftSpeed 00000000 --:--:-- --:--:-- --:--:--0*Trying 123.125.114.144... * TCP_NODELAY set * Connected to baidu.com (123.125.114.144) port 80 (#0) > GET / HTTP/1.1 > Host: baidu.com > User-Agent: curl/7.55.1 > Accept: */* > < HTTP/1.1 200 OK < Date: Sat, 20 Apr 2019 08:15:07 GMT < Server: Apache < Last-Modified: Tue, 12 Jan 2010 13:48:00 GMT < ETag: "51-47cf7e6ee8400" < Accept-Ranges: bytes < Content-Length: 81 < Cache-Control: max-age=86400 < Expires: Sun, 21 Apr 2019 08:15:07 GMT < Connection: Keep-Alive < Content-Type: text/html < { [81 bytes data] 1008110081008100:00:01 --:--:--0:00:01470 * Connection #0 to host baidu.com left intact 複製程式碼
Request
> GET / HTTP/1.1 # StartLine: 方法 地址 協議 > Host: baidu.com > User-Agent: curl/7.55.1 > Accept: */* # Headers:key: value 複製程式碼
Response
< HTTP/1.1 200 OK # Start Line: 狀態碼 具體解釋 < Date: Sat, 20 Apr 2019 08:15:07 GMT < Server: Apache < Last-Modified: Tue, 12 Jan 2010 13:48:00 GMT < ETag: "51-47cf7e6ee8400" < Accept-Ranges: bytes < Content-Length: 81 < Cache-Control: max-age=86400 < Expires: Sun, 21 Apr 2019 08:15:07 GMT < Connection: Keep-Alive < Content-Type: text/html # Headers: key: value 複製程式碼
Message Body
<html> <meta http-equiv="refresh" content="0;url=http://www.baidu.com/"> </html> 複製程式碼
簡單小程式
urllib requests
-
urllib
和urllib2
是相互獨立的模組(在python3.3後urllib2已經不能再用,只能用urllib.request來代替) -
requests
庫使用了urllib3
(多次請求重複使用一個socket
)
- urllib
import urllib.request as urllib2 def use_simple_urllib2(): url = 'http://httpbin.org/ip' response = urllib2.urlopen(url) print('>>>Response Headers') print(response.info()) print('>>>Response Body') #獲取返回內容,readlines()得到的是二進位制,需要轉化為字串輸出 print(response.read().decode()) >>>Response Headers Access-Control-Allow-Credentials: true Access-Control-Allow-Origin: * Content-Type: application/json Date: Sat, 20 Apr 2019 08:38:52 GMT Referrer-Policy: no-referrer-when-downgrade Server: nginx X-Content-Type-Options: nosniff X-Frame-Options: DENY X-XSS-Protection: 1; mode=block Content-Length: 51 Connection: Close >>>Response Body { "origin": "122.205.61.100, 122.205.61.100" } 複製程式碼
def use_param_urllib2(): url_get = 'http://httpbin.org/get' param = {'param1': 'hello', 'param2': 'world'} param = urllib.parse.urlencode(param) print('>>>Resquest Params') print(param) response = urllib2.urlopen('?'.join([url_get, '%s']) % param) print('>>>Response Headers') print(response.info()) print('>>>Status Code') print(response.getcode()) print('>>>Response Body') #獲取返回內容,readlines()得到的是二進位制,需要轉化為字串輸出 print(response.read().decode()) >>>Resquest Params param2=world¶m1=hello >>>Response Headers Access-Control-Allow-Credentials: true Access-Control-Allow-Origin: * Content-Type: application/json Date: Sat, 20 Apr 2019 09:04:11 GMT Referrer-Policy: no-referrer-when-downgrade Server: nginx X-Content-Type-Options: nosniff X-Frame-Options: DENY X-XSS-Protection: 1; mode=block Content-Length: 299 Connection: Close >>>Status Code 200 >>>Response Body { "args": { "param1": "hello", "param2": "world" }, "headers": { "Accept-Encoding": "identity", "Host": "httpbin.org", "User-Agent": "Python-urllib/3.5" }, "origin": "122.205.61.100, 122.205.61.100", "url": "https://httpbin.org/get?param2=world¶m1=hello" } 複製程式碼
- request
def use_simple_request(): url = 'http://httpbin.org/ip' response = requests.get(url) print('>>>Response Headers') print(response.headers) print('>>>Response Body') print(response.text) 複製程式碼
def use_param_request(): url_get = 'http://httpbin.org/ip' param = {'param1': 'hello', 'param2': 'world'} print('>>>Resquest Params') print(param) response = requests.get(url_get,params=param) print('>>>Response Headers') print(response.headers) print('>>>Status Code') print(response.status_code) print(response.reason) print('>>>Response Body') print(response.json()) >>>Resquest Params {'param2': 'world', 'param1': 'hello'} >>>Response Headers {'Access-Control-Allow-Origin': '*', 'X-XSS-Protection': '1; mode=block', 'Content-Type': 'application/json', 'Access-Control-Allow-Credentials': 'true', 'X-Content-Type-Options': 'nosniff', 'Content-Length': '58', 'X-Frame-Options': 'DENY', 'Server': 'nginx', 'Date': 'Sat, 20 Apr 2019 09:13:01 GMT', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Referrer-Policy': 'no-referrer-when-downgrade'} >>>Status Code 200 OK >>>Response Body {'origin': '115.156.141.224, 115.156.141.224'} 複製程式碼