【Machine Learning】【Andrew Ng】- Quiz2(Week 6)

阿新 • • 發佈：2019-01-17

1、You are working on a spam classification system using regularized logistic regression. “Spam” is a positive class (y = 1) and “not spam” is the negative class (y = 0). You have trained your classier and there are m = 1000 examples in the cross-validation set. The chart of predicted class vs. actual class is:

/	actual class:1	actual class:0
predicted class:1	85	890
predicted class:0	15	10

for reference:
- accuracy=(true positives + true negatives)/(total examples)
- precision = (true positives) /(true positives + false positives)
- recall = (true positives)/(true positives + false negatives)
- F1 score = (2*precision*recall)/(precision + recall)
What is the classier’s score (as a value from 0 to 1)?
答案：0.1581.
解析：
precision = 85/(85+890);
recall = 85/(85+15);

2、Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true. Which are the two?
A. When we are willing to include high order polynomial features of (such as x1^2,x2^2,x1x2,etc.).

B. We train a learning algorithm with a large number of parameters (that is able to learn/represent fairly complex functions).

C. The features contain sufficient information to predict accurately. (For example, one way to verify this is if a human expert on the domain can confidently predict y when given only x).

D. We train a learning algorithm with a small number of parameters (that is thus unlikely to overfit).

答案：BC。這兩個條件都很重要，容易忘記。

3、Suppose you have trained a logistic regression classier which is outputing h(x).
Currently, you predict 1 if h(x)>= threshold, and predict 0 if h(x) less than threshold, where currently the threshold is set to 0.5.
Suppose you increase the threshold to 0.9. Which of the following are true? Check all that apply.
A、The classifier is likely to now have lower precision.
B、The classifier is likely to now have lower recall.
C、The classifier is likely to have unchanged precision and recall, but lower accuracy.
D、The classifier is likely to have unchanged precision and recall, but
higher accuracy.
答案：B。我理解得precision就是預測準的概率，recall就是不漏掉的概率。所以閾值越高，越容易預測準，越confident，但是也越容易有漏網之魚。

4、Suppose you are working on a spam classier, where spam emails are positive examples (y=1) and non-spam emails are negative examples (y=0). You have a training set of emails in which 99% of the emails are non-spam and the other 1% is spam. Which of the following statements are true? Check all that apply.
A、If you always predict non-spam (output y=0), your classier will have a recall of 0%.
B、If you always predict spam (output y=1), your classier will have a recall of 0% and precision of 99%.
C、If you always predict spam (output y=1), your classier will have a recall of 100% and precision of 1%.
D、If you always predict non-spam (output y=0), your classier will have an accuracy of 99%.
答案：ACD
當全部預測為1時，如下：

/	actual class:1	actual class 0
predicted class:1	1%	99%
predicted class:0	0	0

accuracy = 1%
precision = 1%/(1%+99%) = 1%
recall = 1%/(1%+0) = 100%
當全部預測為0時，如下:

/	actual class:1	actual class 0
predicted class:1	0	0
predicted class:0	1%	99%

accuracy = 99%
precision = 0/(0+0) = 0%
recall = 0/(0+1%) = 0%

5、Which of the following statements are true? Check all that apply.
A、If your model is underfitting the training set, then obtaining more data is likely to help.
B、Using a very large training set makes it unlikely for model to overfit the training data.
C、It is a good idea to spend a lot of time collecting a large amount of data before building your first version of a learning algorithm.
D、After training a logistic regression classier, you must use 0.5 as your threshold for predicting whether an example is positive or negative.
E、On skewed datasets (e.g., when there are more positive examples than negative examples), accuracy is not a good measure of performance and you should instead use score based on the precision and recall.
答案：BE
A，錯誤，underfit主要是模型的問題，多點訓練集並不會很大改善效能
B，對啊，訓練集大了，模型自然很難滿足所有的訓練樣本，所以也更難overfit啦
C，錯誤，第一個模型簡單就好，不需要太多的資料，後面進行調整模型的時候需不需要更多的訓練樣本再看情況
D，錯誤，閾值取多少沒有一個固定的數值，看你需要的效能，一般看F1值，取最大值。
E，正確。

【Machine Learning】【Andrew Ng】- Quiz2(Week 6)

【原】Coursera—Andrew Ng機器學習—Week 8 習題—聚類和降維

【原】Coursera—Andrew Ng機器學習—Week 10 習題—大規模機器學習

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 10—Advice for applying machine learning

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 11—Machine Learning System Design

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 17—Large Scale Machine Learning 大規模機器學習

【Machine Learning】【Andrew Ng】- Quiz2(Week 6)

【Machine Learning】【Andrew Ng】- Quiz(Week 7)

【Machine Learning】【Andrew Ng】- Quiz1(Week 8)

【Andrew Ng】機器學習Exercise1——Linear Regression

【Machine :Learning】樸素貝葉斯

【Machine Learning, Coursera】機器學習Week6 偏斜資料集的處理

【Machine Learning, Coursera】機器學習Week7 核函式

【Machine Learning 】線性迴歸

【Andrew Ng】《Class of 2017 Sloan Fellows presents》

【Machine Learning with Peppa】分享機器學習，數學，統計和程式設計乾貨

【原】Coursera—Andrew Ng機器學習—彙總（課程筆記、測驗習題答案、程式設計作業原始碼）

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 12—Support Vector Machines 支援向量機

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 14—Dimensionality Reduction 降維

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 15—Anomaly Detection異常檢測

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 16—Recommender Systems 推薦系統

【Machine Learning】【Andrew Ng】- Quiz2(Week 6)

相關推薦