1. 程式人生 > >python 利用jieba庫詞頻統計

python 利用jieba庫詞頻統計

clu eve color items text true eba word lambda

 1 #統計《三國誌》裏人物的出現次數
 2 
 3 import jieba
 4 text = open(threekingdoms.txt,r,encoding=utf-8).read()
 5 excludes = {將軍,卻說,二人,不能,如此,荊州,不可,商議,如何,軍士,左右,主公,引兵,次日,大喜,軍馬,
 6 天下,東吳,於是}
 7 #返回列表類型的分詞結果
 8 words = jieba.lcut(text)
 9 #通過字典映射,統計次數
10 counts = {}
11 for
word in words: 12 if len(word) == 1: 13 continue 14 elif word == 孔明曰 or word == 孔明: 15 rword = 諸葛亮 16 elif word == 關公 or word == 雲長: 17 rword = 關羽 18 elif word == 玄德 or word == 玄德曰: 19 rword = 劉備 20 elif word == 孟德 or word == 丞相
: 21 rword = 曹操 22 else: 23 rword = word 24 counts[rword] = counts.get(rword,0) + 1 25 for word in excludes: 26 del counts[word] 27 items = list(counts.items()) 28 #排序,從大到小 29 items.sort(key=lambda x:x[1],reverse=True) 30 for i in range(5): 31 word,count = items[i] 32
print({0:<10}{1:>5}.format(word,count))

python 利用jieba庫詞頻統計