Python用Scrapy爬蟲報錯UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' ,解決方案
阿新 • • 發佈:2018-11-03
錯誤:UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 7: illegal multibyte sequence
解決:import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')
從網上抓取網站寫下面這段程式碼時,發現報UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX 錯誤
from urllib import request
req=request.Request("https://www.baidu.com")
req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0")
resp=request.urlopen(req)
print(resp.read().decode('utf-8'))
查了一下發現瞭解決辦法原來是print()函式自身有限制,不能完全列印所有的unicode字元。
其實print()函式的侷限就是 Python預設編碼的侷限,因為系統是win7的,python的預設編碼不是'utf-8',改一下python的預設編碼成'utf-8'就行了
import io import sys from urllib import request sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') #改變標準輸出的預設編碼 req=request.Request("https://www.baidu.com") req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0") resp=request.urlopen(req) print(resp.read().decode('utf-8'))
雖然可以解決了報錯,但發現中文亂碼,原來是cmd編碼的不相容utf-8,若要解決這問題,改一下python的預設編碼成'gb18030'就行了
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030') #改變標準輸出的預設編碼
轉載自:https://blog.csdn.net/qq_28359387/article/details/54974578