python 利用jieba庫詞頻統計
阿新 • • 發佈:2018-07-13
clu eve color items text true eba word lambda
1 #統計《三國誌》裏人物的出現次數 2 3 import jieba 4 text = open(‘threekingdoms.txt‘,‘r‘,encoding=‘utf-8‘).read() 5 excludes = {‘將軍‘,‘卻說‘,‘二人‘,‘不能‘,‘如此‘,‘荊州‘,‘不可‘,‘商議‘,‘如何‘,‘軍士‘,‘左右‘,‘主公‘,‘引兵‘,‘次日‘,‘大喜‘,‘軍馬‘, 6 ‘天下‘,‘東吳‘,‘於是‘} 7 #返回列表類型的分詞結果 8 words = jieba.lcut(text) 9 #通過字典映射,統計次數 10 counts = {} 11 forword in words: 12 if len(word) == 1: 13 continue 14 elif word == ‘孔明曰‘ or word == ‘孔明‘: 15 rword = ‘諸葛亮‘ 16 elif word == ‘關公‘ or word == ‘雲長‘: 17 rword = ‘關羽‘ 18 elif word == ‘玄德‘ or word == ‘玄德曰‘: 19 rword = ‘劉備‘ 20 elif word == ‘孟德‘ or word == ‘丞相‘: 21 rword = ‘曹操‘ 22 else: 23 rword = word 24 counts[rword] = counts.get(rword,0) + 1 25 for word in excludes: 26 del counts[word] 27 items = list(counts.items()) 28 #排序,從大到小 29 items.sort(key=lambda x:x[1],reverse=True) 30 for i in range(5): 31 word,count = items[i] 32print(‘{0:<10}{1:>5}‘.format(word,count))
python 利用jieba庫詞頻統計