1. 程式人生 > >神經網路與深度學習(Quiz 3)

神經網路與深度學習(Quiz 3)

1。Which of the following are true? (Check all that apply.)

X is a matrix in which each column is one training example.

a[2] (12) denotes activation vector of the 12th layer on the 2nd training example.

X is a matrix in which each row is one training example.

a[2] denotes the activation vector of the 2nd layer.

a[2] (12) denotes the activation vector of the 2nd layer for the 12th training example.

a[2]4 is the activation output of the 2nd layer for the 4th training example

a[2]4 is the activation output by the 4th neuron of the 2nd layer
解析:
X矩陣為n*m,n為特徵個數,m為訓練個數
a[n] (k)代表第n層第k個數
下標代表是第幾個神經元
2。 The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?

True

False
解析:
sigmoid函式和tanh函式比較:
隱藏層:tanh函式的表現要好於sigmoid函式,因為tanh取值範圍為[−1,+1],輸出分佈在0值的附近,均值為0,從隱藏層到輸出層資料起到了歸一化(均值為0)的效果。
輸出層:對於二分類任務的輸出取值為{0,1},故一般會選擇sigmoid函式。
3。 Which of these is a correct vectorized implementation of forward propagation for layer l, where 1≤l≤L?
這裡寫圖片描述
4。You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?
ReLU
Leaky ReLU
sigmoid


tanh
解析:參考2

5。Consider the following code:

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

What will be B.shape? (If you’re not sure, feel free to run this in python to find out).

(4, 1)

(, 3)

(1, 3)

(4, )

6。Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.

Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.

Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.

The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.
解析:
如果在初始時,兩個隱藏神經元的引數設定為相同的大小,那麼兩個隱藏神經元對輸出單元的影響也是相同的,通過反向梯度下降去進行計算的時候,會得到同樣的梯度大小,所以在經過多次迭代後,兩個隱藏層單位仍然是對稱的。無論設定多少個隱藏單元,其最終的影響都是相同的,那麼多個隱藏神經元就沒有了意義。
7。Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

True

False
解析:
Logistic迴歸引數可以初始化為0

8。You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set α to be very small to prevent divergence; this will slow down learning.

This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
解析:
Sigmoid和tanh啟用函式均存在當z過大或者過小是,梯度幾乎為0,訓練速度變慢的問題
9。Consider the following 1 hidden layer neural network:
這裡寫圖片描述
Which of the following statements are True? (Check all that apply).
W[1] will have shape (2, 4)

b[1] will have shape (4, 1)

W[1] will have shape (4, 2)

b[1] will have shape (2, 1)

W[2]will have shape (1, 4)

b[2] will have shape (4, 1)

W[2] will have shape (4, 1)

b[2]will have shape (1, 1)
解析:
輸入層和隱藏層之間
w[1]−>(4,3):前面的4是隱層神經元的個數,後面的3是輸入層神經元的個數;
b[1]−>(4,1):和隱藏層的神經元個數相同;
隱藏層和輸出層之間
w[1]−>(1,4):前面的1是輸出層神經元的個數,後面的4是隱層神經元的個數;
b[1]−>(1,1):和輸出層的神經元個數相同;
由上面我們可以總結出,在神經網路中,我們以相鄰兩層為觀測物件,前面一層作為輸入,後面一層作為輸出,兩層之間的w引數矩陣大小為(nout,nin),b引數矩陣大小為(nout,1),這裡是作為z=wX+b的線性關係來說明的,在神經網路中,w[i]=wT。
10。In the same network as the previous question, what are the dimensions of Z[1] and A[1]?

Z[1] and A[1] are (4,1)

Z[1] and A[1] are (1,4)

Z[1] and A[1] are (4,2)

Z[1] and A[1] are (4,m)
解析:參考9