1. 程式人生 > >使用python統計《三國演義》小說里人物出現次數前十名,並實現視覺化。

使用python統計《三國演義》小說里人物出現次數前十名,並實現視覺化。

# 一、安裝所需要的第三方庫 > jieba (jieba是優秀的中文分詞第三分庫) > pyecharts (一個優秀的資料視覺化庫) > [《三國演義》.txt下載地址](https://pan.baidu.com/s/10y0C1iE5XEGh1MQy2eQDgg )(提取碼:kist ) ## 使用pycharm安裝庫 - 開啟Pycharm選擇【File】下的Settings ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201123212204458-1158385426.png) - 出現下面頁面, ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201123212326581-2045746829.png) - 選擇右邊的【+】出現下面頁面,在此頁面頂端搜尋想要的庫,然後安裝就可以了 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201123212543283-674205403.png) # 二、編寫程式碼 ```Python import jieba #匯入庫 import os print("人物出現次數前十名:") txt = open('三國演義.txt', 'r' ,encoding='gb18030').read() words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關公" or word == "雲長": rword = "關羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" # 把相同意思的名字歸為一個人 else: rword = word counts[rword] = counts.get(rword, 0) + 1 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) for i in range(10): word, count=items[i] print("{}:{}".format(word, count)) # 列印前十名名單 ``` - 結果如下圖: ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201123224330781-1222440661.png) - 可以看到這裡面有很多不是人物的名字,所以咱們要把這些刪掉。更改程式碼如下 ```Python import jieba #匯入庫 import os print("人物出現次數前十名:") txt = open('三國演義.txt', 'r' ,encoding='gb18030').read() remove = {"將軍", "卻說", "不能", "後主", "上馬", "不知", "天子", "大叫", "眾將", "不可", "主公", "蜀兵", "只見", "如何", "商議", "都督", "一人", "漢中", "人馬", "陛下", "魏兵", "天下", "今日", "左右", "東吳", "於是", "荊州", "不能", "如此", "大喜", "引兵", "次日", "軍士", "軍馬","二人","不敢"} # 這些文字是要排出掉的,多次執行程式所得到的 words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關公" or word == "雲長": rword = "關羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" # 把相同意思的名字歸為一個人 else: rword = word counts[rword] = counts.get(rword, 0) + 1 for word in remove: del counts[word] #匹配文字相等就刪除 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) for i in range(10): word, count=items[i] print("{}:{}".format(word, count)) # 列印前十名名單 ``` - 執行結果如下圖 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124184517406-460277805.png) > 可以看到現在都是人物名稱了 - 匯出資料,程式碼如下 ```Python import jieba #匯入庫 import os print("人物出現次數前十名:") txt = open('三國演義.txt', 'r' ,encoding='gb18030').read() remove = {"將軍", "卻說", "不能", "後主", "上馬", "不知", "天子", "大叫", "眾將", "不可", "主公", "蜀兵", "只見", "如何", "商議", "都督", "一人", "漢中", "人馬", "陛下", "魏兵", "天下", "今日", "左右", "東吳", "於是", "荊州", "不能", "如此", "大喜", "引兵", "次日", "軍士", "軍馬","二人","不敢"} # 這些文字是要排出掉的,多次執行程式所得到的 words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關公" or word == "雲長": rword = "關羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" # 把相同意思的名字歸為一個人 else: rword = word counts[rword] = counts.get(rword, 0) + 1 for word in remove: del counts[word] #匹配文字相等就刪除 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) #匯出資料 fo = open("三國人物出場次數.txt", "a", encoding='utf-8') for i in range(10): word, count=items[i] word = str(word) count = str(count) fo.write(word) fo.write(':') #使用冒號分開 fo.write(count) fo.write('\n') #換行 fo.close() #關閉檔案 ``` - 現在咱們執行看是否匯出,執行結果如下圖。 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124102400726-2121004112.png) > 可以看到已經生成一個名為三國人物出場次數.txt的檔案,而檔案裡的內容就是咱們剛才的資料。 # 三、資料視覺化 - 想要視覺化首先咱們要有資料,咱們把剛才匯出的資料轉換為字典形式。程式碼如下 ```Python #將txt文本里的資料轉換為字典形式 fr = open('三國人物出場次數.txt', 'r', encoding='utf-8') dic = {} keys = [] # 用來儲存讀取的順序 for line in fr: v = line.strip().split(':') dic[v[0]] = v[1] keys.append(v[0]) fr.close() print(dic) ``` -執行結果如下 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124103534014-1428020670.png) - 使用pyecharts繪圖 - 先倒入模組 ```Python from pyecharts import options as opts from pyecharts.charts import Bar ``` - 程式碼如下 ```Python # 繪圖 list1=list(dic.keys()) list2=list(dic.values()) #提取字典裡的資料作為繪圖資料 c = ( Bar() .add_xaxis(list1) .add_yaxis("人物出場次數",list2) .set_global_opts( xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)), ) .render("人物出場次數視覺化圖.html") ) ``` - 執行程式看到目錄下會生成一個名為人物出場次數視覺化圖.html的檔案,如下圖 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124185044727-535685821.png) - 使用瀏覽器開啟,就可以看到資料以圖形的方式呈現出來。 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124185256956-213926224.png) # 三、全部程式碼呈現 ```Python #《三國演義》的人物出場次數Python程式碼: import jieba #匯入庫 import os from pyecharts import options as opts from pyecharts.charts import Bar print("人物出現次數前十名:") txt = open('三國演義.txt', 'r' ,encoding='gb18030').read() remove = {"將軍", "卻說", "不能", "後主", "上馬", "不知", "天子", "大叫", "眾將", "不可", "主公", "蜀兵", "只見", "如何", "商議", "都督", "一人", "漢中", "人馬", "陛下", "魏兵", "天下", "今日", "左右", "東吳", "於是", "荊州", "不能", "如此", "大喜", "引兵", "次日", "軍士", "軍馬","二人","不敢"} # 這些文字是要排出掉的,多次執行程式所得到的 words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關公" or word == "雲長": rword = "關羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" # 把相同意思的名字歸為一個人 else: rword = word counts[rword] = counts.get(rword, 0) + 1 for word in remove: del counts[word] #匹配文字相等就刪除 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) #匯出資料 fo = open("三國人物出場次數.txt", "a", encoding='utf-8') for i in range(10): word, count=items[i] word = str(word) count = str(count) fo.write(word) fo.write(':') #使用冒號分開 fo.write(count) fo.write('\n') #換行 fo.close() #關閉檔案 #將txt文本里的資料轉換為字典形式 fr = open('三國人物出場次數.txt', 'r',encoding='utf-8' ) dic = {} keys = [] # 用來儲存讀取的順序 for line in fr: v = line.strip().split(':') dic[v[0]] = v[1] keys.append(v[0]) fr.close() print(dic) # 繪圖 list1=list(dic.keys()) list2=list(dic.values()) #提取字典裡的資料作為繪圖資料 c = ( Bar() .add_xaxis(list1) .add_yaxis("人物出場次數",list2) .set_global_opts( xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)), ) .render("人物出場次數視覺化圖.html"