python--python3爬蟲之模擬登入知乎
阿新 • • 發佈:2019-02-15
程式碼在python3環境下測試通過:
from bs4 import BeautifulSoup import requests url = 'http://www.zhihu.com' login_url = url+'/login/email' captcha_url = 'http://www.zhihu.com/captcha.gif' headers={ 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Referer': 'http://www.zhihu.com/', 'Content-Length': '154', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36', 'Accept-Encoding': 'gzip, deflate, sdch', 'Host':' www.zhihu.com', 'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4,zh-TW;q=0.2', 'Content-Type': 'application/x-www-form-urlencoded', 'Connection':' keep-alive' } login_data={'email':'xxxx',#替換為賬號 'password':'xxxx',#替換為密碼 'remember_me':'true', 'Referer': 'http://www.zhihu.com/' } def add_xsrf(): '''向login_data裡面新增_xsrf值,首先獲取未登入狀態的響應報文, 利用soup解析出_xsrf值''' soup=BeautifulSoup(requests.get(url).text) xsrf=soup.find('input',attrs={'name':'_xsrf'})['value'] login_data['_xsrf'] = xsrf.encode('utf-8') def add_captcha(): captcha =session.get(captcha_url,stream=True) with open('captcha.gif','wb') as f: for line in captcha.iter_content(10): f.write(line) captcha_str = input('請輸入驗證碼:') login_data['captcha'] = captcha_str if __name__=='__main__': session = requests.session() add_xsrf() add_captcha() responds=session.post(login_url, headers=headers, data=login_data) with open('zhihu.txt','wt',encoding="utf8",errors='ignore')as f: print(session.get(url).text,file=f)
說明:
1.用到兩個第三方庫:用requests代替urllib,用BeautifulSoup代替re。下載方式:命令列鍵入 pip install requests、pip install BeautifulSoup。
2.驗證碼暫時無法做到自動識別,需要手動填寫。
3.對於兩種登入方式,url分別為'http://www.zhihu.com/login/email'、'http://www.zhihu.com/login/phone_num' 。推薦使用郵箱登入,手機登陸由於知乎對密碼加密會出現
密碼報錯的現象(可以抓包獲取加密後密碼)。
4.開啟檔案時一定註明:encoding="utf8",errors='ignore' ,否則會出現UnicodeEncodeError。
如果你想從二進位制模式的檔案中讀取或寫入文字資料,必須確保要進行解碼和編碼操作。比如:
with open('somefile.bin', 'rb') as f:
data = f.read(16)
text = data.decode('utf-8')
with open('somefile.bin', 'wb') as f:
text = 'Hello World'
f.write(text.encode('utf-8'))
5.兩個庫的官方文件:http://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
http://cn.python-requests.org/zh_CN/latest/