1. 程式人生 > >AI-014: 吳恩達教授(Andrew Ng)的機器學習課程學習筆記49

AI-014: 吳恩達教授(Andrew Ng)的機器學習課程學習筆記49

本文是學習Andrew Ng的機器學習系列教程的學習筆記。教學視訊地址:

49. Machine learning system design: prioritizing what to work on: spam classification example




  • 收集大量資料
  • 使用從郵件路由資訊(比如發件人、標題)中提取的複雜特徵,比如空標題、@saler.com等
  • 使用從郵件內容中提取的複雜特徵,比如由降價、促銷等詞彙
  • 識別錯誤拼寫

50. Machine Learning system design: Error analysis






51. Machine learning system design: Error metric for skewed classes

skewed classes 偏斜類

accuracy 精確度

Precision 查準率

Recall 召回率


if a classify is getting high precision and high recall then we are actually confident that the algorithm has to be doing well, even if we have very skewed classes.

So for the problem of skewed classes, precision and recall gives us more direct insight into how the learning algorithm is doing, and this is often a much better way to evaluate our learning algorithms than looking at classification error(

類誤) or classification accuracy(準確率) when the classes are very skewed.

51. Machine learing system design: Trading off precision and recall




use F function to compute if the precision and recall is ok.

52. Machine learning system design: data for machine learning

In such condition, the size of training set will advance the algorithm.

in this case, large training set can get good result and no need to discuss using which algorithms.

key test:

first, can a human experts look at the features x and confidently predict the value of y.

second, can we actually get a large training set and training the learning algorithm with a lot of parameters in the training set.

If you can do the both, you often can get a very good algorithm.