1. 程式人生 > >Python用Scrapy爬蟲報錯UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' ,解決方案

Python用Scrapy爬蟲報錯UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' ,解決方案

錯誤:UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 7: illegal multibyte sequence
解決:import io
     import sys
     sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')

從網上抓取網站寫下面這段程式碼時,發現報UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX 錯誤

from urllib import request
req=request.Request("https://www.baidu.com")
req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0")
resp=request.urlopen(req)
print(resp.read().decode('utf-8'))

查了一下發現瞭解決辦法原來是print()函式自身有限制,不能完全列印所有的unicode字元。

其實print()函式的侷限就是

Python預設編碼的侷限,因為系統是win7的,python的預設編碼不是'utf-8',改一下python的預設編碼成'utf-8'就行了

import io  
import sys 
from urllib import request
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') #改變標準輸出的預設編碼
req=request.Request("https://www.baidu.com")
req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0")
resp=request.urlopen(req)
print(resp.read().decode('utf-8'))

雖然可以解決了報錯,但發現中文亂碼,原來是cmd編碼的不相容utf-8,若要解決這問題,改一下python的預設編碼成'gb18030'就行了

 

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')         #改變標準輸出的預設編碼 

轉載自:https://blog.csdn.net/qq_28359387/article/details/54974578