1. 程式人生 > >【Machine Learning】【Andrew Ng】- Quiz1(Week 8)

【Machine Learning】【Andrew Ng】- Quiz1(Week 8)

1、For which of the following tasks might K-means clustering be a suitable
algorithm? Select all that apply.
A. Given a database of information about your users, automatically group them into different market segments.
B. Given sales data from a large number of products in a supermarket, figure out which products tend to form coherent groups (say are frequently purchased together) and thus should be put on the same shelf.
C. Given historical weather records, predict the amount of rainfall
tomorrow (this would be a real-valued output)
D. Given sales data from a large number of products in a
supermarket, estimate future sales for each of these products.
答案:AB。K均值演算法只要用來分類,但是程式設計題裡的拓展練習也有用來壓縮圖片,從128×128×24的壓縮至16×24+128×128×4的16種顏色

2、Suppose we have three cluster centroids 這裡寫圖片描述 and 這裡寫圖片描述. Furthermore, we have a training example 這裡寫圖片描述. After a cluster assignment step, what will c(i) be?
A. c(i) = 1
B. c(i) is not assigned
C. c(i) = 3
D. c(i) = 2
答案:C。直接求x到各個聚類中心的距離就好,到第三個聚類中心的距離為根2,最小,所以歸為第三類。

3、K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?
A. Randomly initialize the cluster centroids.
B. Test on the cross-validation set.
C. The cluster assignment step, where the parameters c(i) are updated.
D. Move the cluster centroids, where the centroids μk are updated.
答案:CD。A隨機初始化聚類中心是K均值的重要步驟,但是不在內部迴圈裡。只用在當聚類中心比較少的時候,用來避開由於初始值不好引起的區域性解。

4、Suppose you have an unlabeled dataset {x(1),…,x(m)}. You run K-means with 50 different random initializations, and obtain 50 different clusterings of the data. What is the recommended way for choosing which one of these 50 clusterings to use?
A. Compute the distortion function J(c(1),…,c(m),μ(1),…μ(k)),
and pick the one that minimizes this.
B. Plot the data and the cluster centroids, and pick the clustering that gives the most “coherent” cluster centroids.
C. Use the elbow method.
D.Manually examine the clusterings, and pick the best one.
答案:A。當然代價最小的啊。B不知道是個啥,CD是用來確定聚類中心個數的時候用的,而本題聚類中心個數已經確定了。

5、Which of the following statements are true? Select all that apply.
A. On every iteration of K-means, the cost function J(c(1),…,c(m),μ(1),…μ(k)) (the distortion function) should either stay the same or decrease; in particular, it should not increase.
B. A good way to initialize K-means is to select K (distinct) examples
from the training set and set the cluster centroids equal to these selected examples.
C. K-Means will always give the same results regardless of the initialization of the centroids.
D. Once an example has been assigned to a particular centroid, it will never be reassigned to another different centroid
答案:AB。C錯誤,K均值演算法的效果與初始值的選取有很大的關係,D在迭代過程中一般會一直改變聚類中心的,直到不改變時聚類結束。