階段作業1：完整的中英文詞頻統計+補交上次作業

阿新 • • 發佈：2018-11-12

#補交作業

cc = ('''Counting stars Lately I've been, I've been losing sleep 　　
Dreaming 'bout the things that we could be 　　
But baby I've been, I've been prayin' hard 　 　　
Said no more counting dollars 　　We'll be counting stars 　　
Yeah, we'll be counting stars 　　I see this life Like a swinging vine 　
　Swing my heart across the line 　　In my face is flashing signs 　　Seek it out and ye shall find
　　Old, but I'm not that old 　　Young, but I'm not that bold 　　And I don't think the world is sold 　
　I'm just doing what we're told 　　I, feel something so right 　　But doing the wrong thing 　　
I, feel something so wrong 　　But doing the right thing 　　I could lie, could lie, could lie 　
　everything that kills me makes me feel alive 　　Lately I've been, I've been losing sleep 　
　Dreaming 'bout the things that we could be 　　Baby I've been, I've been prayin' hard 　
　Said no more counting dollars 　　We'll be counting stars 　　Lately I've been, I've been losing sleep 　　
Dreaming 'bout the things that we could be 　　Baby I've been, I've been prayin' hard 　　Said no more counting dollars 　
　We'll be, we'll be counting stars 　　I feel the love And I feel it burn 　　Down this river every turn 　
　Hope is a four letter word 　　Make that money 　　Watch it burn 　　Old, but I'm not that old 　
　Young, but I'm not that bold 　　And I don't think the world is sold 　　I'm just doing what we're told 　
　I, feel something so wrong 　　But doing the right thing 　　I could lie, could lie, could lie 　
　Everything that drowns me makes me wanna fly 　　Lately I've been, I've been losing sleep 　
　Dreaming 'bout the things that we could be 　　Baby I've been, I've been prayin' hard
　　Said no more counting dollars 　　We'll be counting stars 　　Lately I've been, I've been losing sleep 　
　Dreaming 'bout the things that we could be 　　Baby I've been, I've been prayin' hard 　
　Said no more counting dollars 　　We'll be, we'll be counting stars 　　Take that money And watch it burn 　　Sink in the river
''')
cc = cc.replace('.', ' ')
ccList = cc.split()
print(len(cc), ccList)  # 分隔一個單詞並統計英文單詞個數
ccSet = set(ccList)  # 將列表轉化成集合，再將集合轉化成字典來統計每個單詞出現個數

print(ccSet)


strDict = {}
# for star in ccSet:
#     strDict[star] = ccList.count(star)
# print(strDict, len(strDict))
for star in ccSet:
    strDict[star]=cc.count(star)
for key in ccSet:
    print(key,strDict[key])
wclist=list(ccSet.items())
print(wclist)
# def takeSecond(elem):
#     return  elem[1]
# wclist.sort(key=takeSecond,reverse=True)
# print(wclist)

#按詞頻排序
wcList=list(strDict.items())
print(wcList)
wcList.sort(key=lambda x:x[1],reverse=True)
print(wcList)

#輸出TOP(20)
for i in range(20):
    print(wcList[i])


# 列表的遍歷

cclist = ['wqdq', 'dqd', 'Awd', 313, '小四', 'dqd']
print(cclist)
cclist.append('gegeheh')
print(cclist)
cclist.pop(2)
print(cclist)
for i in cclist:
    print(i)

# 元組的遍歷

tuple = ('jtfjhrr', 'rqfw f2q', 800, 10)
print(tuple[2])
for i in tuple:
    print(i)

# 字典的遍歷

dic = {'fhehe': '4w6436', 'jgdns': 7, '4w6436': 'First'}

print('fhehe:', dic['fhehe'])
print('4w6436:', dic['4w6436'])

dic['4w6436'] = 8;
dic['4w6436'] = "對接歐文機房的維護"

print('4w6436:', dic['4w6436'])
print('4w6436:', dic['4w6436'])

for key in dic:
    print(key, ':', dic.get(key))

# 集合的遍歷

a = set([1, 2, 3, 6, 5])
print(a)

a.add(4)
print(a)
a.add('uteru')
print(a)

a.remove(5)
print(a)

for i in a:
    print(i)

#此次作業

fo=open('ccc1015.txt','r',encoding='utf-8')
strBig=fo.read().lower()
fo.close()
print(strBig)
#字串預處理：#大小寫,標點符號，特殊符號
sep=""".,:;!?"""
for ch in sep:
    strBig=strBig.replace(ch,'')
strlist=strBig.split()
print(len(strlist),strlist)
strSet=set(strlist)
exclude={'is','be','be','I','we','the','in'}
strSet=strSet-exclude
print(len(strSet),strSet)
strDict={}
for word in strSet:
    strDict[word]=strlist.count(word)
print(len(strDict),strDict)
#按詞頻排序
wcList=list(strDict.items())
print(wcList)
wcList.sort(key=lambda x:x[1],reverse=True)
print(wcList)

#輸出TOP(20)
for i in range(20):
    print(wcList[i])




# 中文版


#讀取文字檔案
f = open('shengxu.txt','r',encoding='utf-8')
story = f.read()
f.close()
print(story)

#預處理
sep = '，。：“”？！'''     #符號處理
for ch in sep:
    story=story.replace(ch,' ')   #利用for迴圈語句把特殊符號替換成空格
    print(story)

#中文分詞：結巴
import jieba
cnStr = story
#精確模式
print(list(jieba.cut(cnStr)))

# 分隔提取單詞
strList = story.split(' ')
print(len(strList), strList)
# 單詞計數字典
strSet = set(strList)
print(len(strSet), strSet)
strDict = {}
for word in strSet:
    strDict[word] = strList.count(word)
    # print(len(strDict),strDict)
# 詞頻排序
wcList = list(strDict.items())
# print(wcList)
wcList.sort(key=lambda x: x[1], reverse=True)
# print(wcList)

# 輸出TOP10
for i in range(10):
    print(wcList[i])

階段作業1：完整的中英文詞頻統計+補交上次作業

#補交作業 cc = ('''Counting stars Lately I've been, I've been losing sleep 　　 Dreaming 'bout the things that we could be 　　 But baby I've been, I've been p

階段作業1：完整的中英文詞頻統計

ini str clas rms encoding nic app 英文 around strBig =‘‘‘Big Big World Emilia I‘m a big big girl In a big big world It‘s not a big

完整的中英文詞頻統計

import word 完整 txt sep open read list span #讀取字符串str f = open(‘zz.txt‘,‘r‘,encoding=‘utf-8‘) strbig= f.read() f.close() sep =‘‘‘.,;:?!-

網易雲課堂_C++程序設計入門(下)_第7單元：出入雖同趣，所向各有宜 – 文件輸入和輸出_第7單元 - 作業1：OJ編程

c++ detail using span 換行 tro size str cout 第7單元 - 作業1：OJ編程查看幫助 返回溫馨提示： 1.本次作業屬於Online Judge題目，提交後由系統即時判分。 2.學生可以在作業截止時間之前不限次數提

作業1：小型考勤登記表

btn ctype parameter edate 加油！ tab wid delet basepath 這次在廣州實習了20天，收獲還比較大。不過仍需要繼續努力。這次總共布置了兩個作業，我總結一下：登記考勤信息，查看信息——主要就是往數據庫增加數據，然後再從數據庫中讀取

實踐作業1：測試管理工具實踐 Day2

測試 link 管理 mysq lin apache pac manage 網絡 1、嘗試配置TestLink所需環境　　安裝配置php+apache+mysql時遇到一系列稀奇古怪的錯誤。 2、百度之後發現有可行的替代工具：Vertrigoserv（VertrigoS

20180320作業1：源代碼管理工具調查

分析代碼版本控制解決沖突一致性邏輯不支持 red lte 當前較為流行的幾種源代碼管理工具的優缺點比較分析: 工具優點缺點 SVN 1、對中文支持好，使用界面統一，管理方便，邏輯明確，符合一般人思維習慣。 2、易於管理，集中

作業1：編寫登錄接口

r+ pre 登錄系統 while 作業添加 usr %s adl #！/usr/bin/env python# Author:Zhan Weiimport os,sys,getpass #導入os,sys,getpass 模塊u = 0

軟工作業1：wc.exe項目開發（java）

代碼行數學習正則表達式 default man 控制字符 min 目的若有多個 Github地址：https://github.com/Zzhaomin/learngit 項目相關要求： wc.exe 是一個常見的工具，它能統計文本文件的字符數、

完整中英文詞頻

學習班 read only ict 計算 pretty art hand spl strRun=‘‘‘Well looky here looky here Ah what do we have? Another pretty thang ready for me to g

中英文詞頻統計

所有切片去除 lower 輸出 app lac list ctu 步驟： 1.準備utf-8編碼的文本文件file 2.通過文件讀取字符串 str 3.對文本進行預處理 4.分解提取單詞 list 5.單詞計數字典 set , dict 6.按詞頻排序 list.sor

軟工實踐第五次作業-爬蟲和自定義詞頻統計

system 屬性 project html標簽 ttr ont 標題改進提交軟工實踐第五次作業-爬蟲和自定義詞頻統計題目地址:https://edu.cnblogs.com/campus/fzu/FZUSoftwareEngineering1816W/homewo

自我介紹+課後作業1：準備

如何 first 合格代碼量小菜不知道 .cn 路徑過去【自我介紹】 035107334；我是石臻林；我的愛好是玩遊戲；我的碼雲個人主頁是：https://gitee.com/qq605277743/events 我的第一個項目地址是：https://gite

並行作業1：MPI安裝，及示例程式執行

執行MPI程式系統採用vm下ubuntu16.04 一、MPI系統安裝 1.1 安裝環境 (1)作業系統：Ubuntu 16.04.4 （64位）。 (2)g++ 版本：gcc version 5.4.0 1.2 安裝包下載地址及安裝包 (1)下載地址：http

Java 作業 1：建立一個ArrayList集合和一個LinkedList集合

作業：建立一個ArrayList集合和一個LinkedList集合， (1) 分別向這兩個集合中新增100000個整數，計算各自的時間並輸出； (2) 用for迴圈遍歷這兩個集合，計算各自的時間

作業1：搭建網上購物商城結構

作業樣式 html程式 <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Insert title here</title> <

寒假作業1：列印沙漏

7-1 列印沙漏（20 分）本題要求你寫個程式把給定的符號列印成沙漏的形狀。例如給定17個“*”，要求按下列格式列印 ***** *** *

實踐作業3：白盒測試實踐（小組作業）記錄3

自己 logs 軟件學院 idt str strong span mil tro 會議時間：2017.12.21 會議地點：軟件學院北樓507 參會人員：魯慧敏、寧莉莎、張江、王瑞、李佳明會議目的：將大家討論後回去自己完成版塊的單元測試和缺陷報告，靜態代碼評估遇到的問題拿

期末綜合大作業：詞頻統計

ace 技術分享 nco IV style txt lam bubuko #1. bigFile = open(‘big.txt‘,mode=‘r‘,encoding=‘utf-8‘) bigText=bigFile.read() bigFile.close() pri

第1次作業-詞頻統計

com output 實踐 cnblogs 一個 href tps uri www 作業地址：【https://edu.cnblogs.com/campus/nenu/2016CS/homework/2110】一、程序要求(60分) 簡述程序名稱：wf.exe 作用：一

階段作業1：完整的中英文詞頻統計+補交上次作業

相關推薦