Python3中文字元編碼問題

Python · 發表 2018-11-18 16:14:00

摘要：前言最近在嘗試 Python Web方面的開發嘗試，框架使用的是Django，但是在讀取資料庫並頁面展示的時候，出現了中文編碼的問題。問題我們看下面一段程式碼，獲取小說章節列表： def main(request): sql = "SELECT ...

前言

最近在嘗試 Python Web方面的開發嘗試，框架使用的是Django，但是在讀取資料庫並頁面展示的時候，出現了中文編碼的問題。

問題

我們看下面一段程式碼，獲取小說章節列表：

def main(request):
sql = "SELECT id,title FROM novel LIMIT 10;"
result = mysql.getAll(sql)
context = {'novel_list': result}
return render(request, 'novel_list.html',context)

頁面輸出：

{% for novel in novel_list %}
<a href="/chapter/{{novel.id}} "><li>{{ novel.title }}</li></a>
{% endfor %}

如果不加任何轉換，頁面上顯示的中文將會是位元組碼。

解決

這裡我們舉一個稍微簡單的例子，dict是資料庫中查詢出來的資料：

import json
dict = {'id': 1, 'title': b'\xe7\xac\xac\xe4\xb8\x80\xe7\xab\xa0 \xe7\xa7\xa6\xe7\xbe\xbd'}
dup = json.dumps(dict ,ensure_ascii=False)
print(dup)

Python2執行輸出：

{"id": 1, "title": "第一章 秦羽"}

Python3執行報錯：

TypeError: Object of type bytes is not JSON serializable

查詢了半天，最終解決方案：

安裝模組：

pip3 install numpy

最終程式碼：

import json
import numpy as np


class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
elif isinstance(obj, bytes):
return str(obj, encoding='utf-8');
return json.JSONEncoder.default(self, obj)

dict= {'id': 1, 'title': b'\xe7\xac\xac\xe4\xb8\x80\xe7\xab\xa0 \xe7\xa7\xa6\xe7\xbe\xbd'}
dup = json.dumps(dict , cls=MyEncoder, ensure_ascii=False, indent=4)
print(dup)

你也可以for迴圈，然後單個轉碼：

sql = "SELECT id,title FROM novel LIMIT 10;"
result = mysql.getAll(sql)
for each in result:
ach['title'] = each['title'].decode('utf-8')

字串通過編碼轉換為位元組碼，位元組碼通過解碼轉換為字串：

str--->(encode)--->bytes，bytes--->(decode)--->str

decode和encode詳解

decode 解碼，在已知字串編碼的情況下，轉碼為unicode ，比如 s.decode('utf-8')，結果為unicode
encode 編碼，在已有unicode的情況下，轉碼為其它編碼，比如 u.encode('utf-8')，結果為utf-8

Web輸出

JSON (JavaScript Object Notation) 是一種輕量級的資料交換格式。它基於ECMAScript的一個子集。

Python3 中可以使用 json 模組來對 JSON 資料進行編解碼，它包含了兩個函式：

json.dumps(): 對資料進行編碼。
json.loads(): 對資料進行解碼。

def main(request):
sql = "SELECT id,title FROM novel LIMIT 10;"
result = mysql.getAll(sql)
# 轉Json物件
result = json.dumps(result, cls=MyEncoder, ensure_ascii=False, indent=4)
# 轉字典型別
result = json.loads(result)
context = {'novel_list': result}
return render(request, 'novel_list.html',context)

引數詳解

json.dumps(result, cls=MyEncoder, ensure_ascii=False, indent=4)

indent

根據資料格式縮排顯示，讀起來更加清晰，indent的數值，代表縮排的位數。

ensure_ascii

如果無任何配置，或者說使用預設配置，輸出的會是中文的ASCII字元嗎，而不是真正的中文。這是因為json.dumps 序列化時對中文預設使用的ascii編碼。

{
"id": 1,
"title": "\u7b2c\u4e00\u7ae0 \u79e6\u7fbd"
}

cls

dict型別的資料(存在中文)，在python2中是可以轉化的，但是在python3中存在序列化問題：

TypeError: Object of type bytes is not JSON serializable

Python3中文字元編碼問題

前言

問題

解決

Web輸出

引數詳解

indent

ensure_ascii

cls

您可能也會喜歡…