1. 程式人生 > >python詞頻統計

python詞頻統計

for don trie 轉換 ems branch art read right


詞頻統計預處理
下載一首英文的歌詞或文章
將所有,.?!’:等分隔符全部替換為空格
將所有大寫轉換為小寫
生成單詞列表
生成詞頻統計
排序
排除語法型詞匯,代詞、冠詞、連詞
輸出詞頻最大TOP10

s=‘Robert Zoellick, a former US Trade Representative and head of the World Bank, once said: "Trade was more about politics than economics." Indeed, international trade among nations is all about business, but once politicians step in, it becomes polarizing with unexpected consequences that could lead to a trade war.‘ \
‘The ever-boastful US President Donald Trump tweeted that "trade wars are good, and easy to win". His newly appointed top economic advisor, Larry Kudlow, should remind him that trade wars are a pyrrhic form of competition in which even the victor is left worse off.‘ \
‘The US Constitution clearly states: "Congress shall regulate interstate and foreign commerce." It grants authority to the executive branch to negotiate trade agreements, but it has the last word on increasing tariffs, whether it But Trump has no patience to follow such procedures. Instead he is issuing executive orders to satisfy his political base.‘ \
‘The Republicans in Congress were shocked that their leader would take such a protectionist action, recognizing it was more about politics than national security. Their swift opposition forced Trump to make Canada and Mexico exceptions (to be part of North American Free Trade Agreement negotiations), and eventually minimize the effects on the US‘ \
‘There will not be a trade war with these countries. If it occurs, it will be with China. The Trump administration has already taken shots that may spark a trade war, including slapping tariffs on solar panels. The next, and most fierce, battlefield in today‘‘s smartphones to enter the US market. Around the corner is Section 301 of the Trade Act of 1974 — originally intended to safeguard patent rights — that will give the US president the authority to limit China‘
s1=s.replace(‘?‘,‘‘)
s2=s1.replace(‘:‘,‘‘)
s3=s2.replace(‘,‘,‘‘)
s4=s3.replace(‘!‘,‘‘)
s5=s4.replace(‘‘,‘‘)
s6=s5.replace(‘"‘,‘‘)
s7=s6.replace(‘-‘,‘‘)
s8=s7.replace(‘.‘,‘‘)
s9=s8.lower()
list1=s9.split()
for i in list1:
print(i)
myset=set(list1)
print(myset)
key={}
for i in myset:
key[i]=list1.count(i)
print(key[i])
for i in {‘a‘,‘an‘,‘the‘,‘to‘,‘in‘,‘on‘,‘is‘,‘are‘,‘too‘,‘am‘}:
if i in key:
key.pop(i)
sort=sorted(key.items(),key=lambda d:d[1],reverse=True)
for j in range(10):
print(sort[j])

python詞頻統計