AI-014: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記49

阿新 • • 發佈：2018-12-18

本文是學習Andrew Ng的機器學習系列教程的學習筆記。教學視訊地址：

49. Machine learning system design: prioritizing what to work on: spam classification example

以建立垃圾郵件過濾系統為例，首先建立分類器：

選擇高頻詞彙作為特徵。

如何降低分類器的錯誤率，舉例：

收集大量資料
使用從郵件路由資訊（比如發件人、標題）中提取的複雜特徵，比如空標題、@saler.com等
使用從郵件內容中提取的複雜特徵，比如由降價、促銷等詞彙
識別錯誤拼寫

50. Machine Learning system design: Error analysis

方法論：

錯誤分析：

看看各種情況的分佈，佔比大的情況可以改進演算法進行識別，嘗試各種新的方法（更多資料、更多特徵...），然後看看引起誤差的主要原因；

演算法最好能夠返回量化的檢驗結果，比如返回錯誤率，這樣根據引入不同的特徵或方法（比如是否使用提取詞幹）獲得的錯誤率來決定如何做更好：

如果引入詞幹提取的錯誤率更小，就採用引入詞幹分析的演算法；

51. Machine learning system design: Error metric for skewed classes

skewed classes 偏斜類

accuracy 精確度

Precision 查準率

Recall 召回率

查準率和召回率越高越好；

if a classify is getting high precision and high recall then we are actually confident that the algorithm has to be doing well, even if we have very skewed classes.

So for the problem of skewed classes, precision and recall gives us more direct insight into how the learning algorithm is doing, and this is often a much better way to evaluate our learning algorithms than looking at classification error(

分類誤差) or classification accuracy(分類準確率) when the classes are very skewed.

51. Machine learing system design: Trading off precision and recall

threshold 臨界值

被查出來的很少，但是一旦查出來，就可以確定->高查準率，低召回率。比如垃圾郵件，你可不希望錯過正常郵件；

被查出來的很多，但是查出來的有很多是誤判->低查準率，高召回率。比如預測癌症，保持懷疑態度：）

use F function to compute if the precision and recall is ok.

52. Machine learning system design: data for machine learning

In such condition, the size of training set will advance the algorithm.

in this case, large training set can get good result and no need to discuss using which algorithms.

key test:

first, can a human experts look at the features x and confidently predict the value of y.

second, can we actually get a large training set and training the learning algorithm with a lot of parameters in the training set.

If you can do the both, you often can get a very good algorithm.

AI-014: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記49

AI-014: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記49

AI-005: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記15-20

AI-004: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記1-14

AI-009: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記38-47

AI-008: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記34-37

AI-007: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記27-33

對話吳恩達（Andrew Ng）：超級大咖深度解析人工智慧以及如何成為已經資料探勘工程師

吳恩達（Andrew Ng）《機器學習》課程筆記（1）第1周——機器學習簡介，單變數線性迴歸

吳恩達（Andrew Ng）《機器學習》課程筆記（2）第2周——多變數線性迴歸

吳恩達實驗（神經網絡和深度學習）第一課第三周，代碼和數據集，親測可運行

AIQ - deeplearning.ai 全套吳恩達老師的深度學習課程筆記及資源線上閱讀

機器學習---吳恩達---Week1（機器學習概述與單變量線性回歸方程分析）

資源 | Hinton、LeCun、吳恩達......不容錯過的15大機器學習課程都在這兒了

斯坦福大學公開課機器學習課程（Andrew Ng）五生成學習演算法

斯坦福大學公開課機器學習課程（Andrew Ng）四牛頓方法與廣義線性模型

吳恩達 DeepLearning.ai 課程提煉筆記（4-2）卷積神經網絡 --- 深度卷積模型

Coursera 深度學習吳恩達 deep learning.ai 筆記整理（3-2）——機器學習策略

吳恩達深度學習筆記（deeplearning.ai）之循環神經網絡（RNN）（一）

吳恩達深度學習筆記（deeplearning.ai）之循環神經網絡（RNN）（二）

吳恩達深度學習筆記（deeplearning.ai）之循環神經網絡（RNN）（三）

AI-014: 吳恩達教授（Andrew Ng）的機器學習課程學習筆記49

相關推薦