作業 20180925-3 效能分析

阿新 • • 發佈：2018-10-05

count color 正則匹配 str 2.0 height font 9.png 表達式

作業要求： https://edu.cnblogs.com/campus/nenu/2018fall/homework/2145

git地址：https://git.coding.net/zhangjy982/word_count.git

要求0

第一次運行：

技術分享圖片

第二次運行：

技術分享圖片

第三次運行：

技術分享圖片

次數	消耗時間(s)
第一次	1.589
第二次	2.059
第三次	1.681
平均	1.776

CPU參數： Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz 2.50GHz

要求1

我猜測程序的瓶頸是替換文章的標點對文章進行分詞，也就是word_split(str)這一函數過程：

 1 def words_split(str):
 2     text = re.findall(r‘\w+‘,str) 8     count_dict = {}
 9     for str in text:
10         if str in count_dict.keys():
11            count_dict[str] = count_dict[str] + 1
12         else:
13            count_dict[str] = 1
14     count_list=sorted(count_dict.items(),key=lambda 
 x:x[1],reverse=True)
15     return count_list

因為正則表達式的處理時間很慢，讀取文章內容產生的字符串又非常長，所以我覺得這塊應該是程序的瓶頸。根據老師課上三角函數的啟發，我的優化方案是：將正則表達式更改為標點符號的替換，因為標點符號的數量有限，在文章篇幅非常長的情況下，替換的時間比正則匹配的時間要短很多。

要求2

因為我使用的編程語言是Python，所以需要使用Python的效能分析工具，我找到了關於Python效能分析的博客並學習了其中的內容，這篇博客的地址在：

http://www.cnblogs.com/xiadw/p/7455513.html

我使用了cProfile作為Python的效能分析工具，代碼為：

1 python -m cProfile -s time wf.py -s < war_and_peace.txt

我得到的結果為：

技術分享圖片

從效能分析結果上來看我的程序耗時最長的三個函數分別是：

1."replace" of "str" object，即字符串類型的替換操作；

2."findall" of "_sre.SRE_Pattern" object，即正則匹配的findall()方法；

3.words_split()，即我自己寫的分詞函數；

要求3

根據要求2解決中得到的結論，我發現排名第一的雖然是replace()函數，但是它執行了53次(替換文本中的亂碼字符)，單次的執行時間為0.305/53 = 0.005755s，而findall()函數的單次執行時間為0.246s，單次執行時間是replace的接近43倍，所以replace不是影響程序運行時間的主要因素，換句話說，也就是優化空間不大；但是正則方法就不一樣了，正則方法可以轉化，可以把正則方法轉化為replace方法，將正則方法轉化為replace函數，效果應該會有比較明顯的改善；

修改後的代碼：

 1 def words_split(str):             #正則表達式進行分詞改為replace
 2     text = str.replace(‘\n‘,‘ ‘).replace(‘.‘,‘ ‘).replace(‘,‘,‘ ‘). 3                replace(‘!‘,‘ ‘).replace(‘\\‘,‘ ‘).replace(‘#‘,‘ ‘). 4                replace(‘[‘,‘ ‘).replace(‘]‘,‘ ‘).replace(‘:‘,‘ ‘). 5                replace(‘?‘,‘ ‘).replace(‘-‘,‘ ‘).replace(‘\‘‘,‘ ‘). 6                replace(‘\"‘,‘ ‘).replace(‘(‘,‘ ‘).replace(‘)‘,‘ ‘). 7                replace(‘—‘,‘ ‘).replace(‘;‘,‘ ‘).lower().split()
 8     count_dict = {}
 9     for str in text:
10         if str in count_dict.keys():
11            count_dict[str] = count_dict[str] + 1
12         else:
13            count_dict[str] = 1
14     count_list=sorted(count_dict.items(),key=lambda x:x[1],reverse=True)
15     return count_list

要求4

新的運行效能分析截圖：

技術分享圖片

新的運行時間截圖：

技術分享圖片

要求5

待老師測評；

作業 20180925-3 效能分析

count color 正則匹配 str 2.0 height font 9.png 表達式作業要求： https://edu.cnblogs.com/campus/nenu/2018fall/homework/2145 git地址：https://git.coding.

作業 20180925-3 效能分析

作業 20180925-3 效能分析

作業要求 20180925-3 效能分析

20180925-3 效能分析

軟件工程第三次作業 - 效能分析

2017年軟件工程第三次作業-2效能分析

軟工作業（3）用戶體驗分析

軟件工程個人作業3 案例分析

軟件工程網絡15個人作業3——案例分析

軟工網絡15個人作業3——案例分析

軟工網絡15個人作業3——案例分析（葉城龍 201521123109）

軟件工程個人作業3 案例分析

軟工網絡15個人作業3——案例分析 201521123056 吳劍通

第五次作業——python效能分析與幾個問題（個人作業）

軟件工程網絡15團隊作業3——需求分析設計

團隊作業3-需求分析設計

軟工網絡15團隊作業3——需求分析與設計

團隊作業3-需求分析與設計

第三次作業——效能分析

簡傑的php程式設計分享-1.3 xdebug效能分析

C 中判斷空字串的3種方法效能分析【月兒原創】

作業 20180925-3 效能分析

相關推薦