1. 程式人生 > >爬蟲-day02-抓取和分析

爬蟲-day02-抓取和分析

https baidu gzip ace .text python htm conn code

###頁面抓取###
1、urllib3
    是一個功能強大且好用的HTTP客戶端,彌補了Python標準庫中的不足
    安裝: pip install urllib3
    使用:
import urllib3
http = urllib3.PoolManager()
response = http.request(GET, http://news.qq.com)
print(response.headers)
result = response.data.decode(gbk)
print(result)
 
發送HTTPS協議的請求
安裝依賴 : pip install certifi
import  certifi
import urllib3
http = urllib3.PoolManager(cert_reqs = CERT_REQUIRED, ca_certs = certifi.where()) #添加證書
resp = http.request(GET, http://news.baidu.com/)
print(resp.data.decode(utf-8))
 
####帶上參數
import urllib3
from urllib.parse import urlencode
http = urllib3.PoolManager()
args 
= {wd : 人民幣} # url = ‘http://www.baidu.com/s?%s‘ % (args) url = http://www.baidu.com/s?%s % (urlencode(args)) print(url) # resp = http.request(‘GET‘ , url) # print(resp.data.decode(‘utf-8‘)) headers = { Accept : text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, **; q=0.01
, Accept-Encoding : gzip, deflate, br, Accept-Language : zh-CN,zh;q=0.9, Connection : keep-alive, Host : www.baidu.com, Referer : https://www.baidu.com/s?wd=人民幣, User-Agent : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" } resp8 = requests.get(url8, fields=args8, headers=headers8) print(resp8.text)

 
 
 
 

爬蟲-day02-抓取和分析