1. 程式人生 > >Coursera 吳恩達 Deep Learning 第2課 Improving Deep Neural Networks 第一週 程式設計作業程式碼 Initialization

Coursera 吳恩達 Deep Learning 第2課 Improving Deep Neural Networks 第一週 程式設計作業程式碼 Initialization

2 - Zero initialization

# GRADED FUNCTION: initialize_parameters_zeros def initialize_parameters_zeros(layers_dims):     """     Arguments:     layer_dims -- python array (list) containing the size of each layer.     Returns:     parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])                     b1 -- bias vector of shape (layers_dims[1], 1)                     ...                     WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])                     bL -- bias vector of shape (layers_dims[L], 1)
    """     parameters = {}     L = len(layers_dims)            # number of layers in the network     for l in range(1, L):         ### START CODE HERE ### (≈ 2 lines of code)         parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l-1]))         parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
        ### END CODE HERE ###     return parameters

3 - Random initialization

# GRADED FUNCTION: initialize_parameters_random def initialize_parameters_random(layers_dims):     """     Arguments:     layer_dims -- python array (list) containing the size of each layer.     Returns:     parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":                     W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])                     b1 -- bias vector of shape (layers_dims[1], 1)                     ...                     WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])                     bL -- bias vector of shape (layers_dims[L], 1)     """     np.random.seed(3)               # This seed makes sure your "random" numbers will be the as ours     parameters = {}     L = len(layers_dims)            # integer representing the number of layers     for l in range(1, L):         ### START CODE HERE ### (≈ 2 lines of code)         parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * 10  #注意括號的數目         parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))         ### END CODE HERE ###     return parameters

4 - He initialization

Xavier初始化的基本思想是保持輸入和輸出的方差一致,這樣就避免了所有輸出值都趨向於0 He initialization的思想是:在ReLU網路中,假定每一層有一半的神經元被啟用,另一半為0,所以,要保持variance不變,只需要在Xavier的基礎上再除以2 # GRADED FUNCTION: initialize_parameters_he def initialize_parameters_he(layers_dims):     """     Arguments:     layer_dims -- python array (list) containing the size of each layer.     Returns:     parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":                     W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])                     b1 -- bias vector of shape (layers_dims[1], 1)                     ...                     WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])                     bL -- bias vector of shape (layers_dims[L], 1)     """     np.random.seed(3)     parameters = {}     L = len(layers_dims) - 1 # integer representing the number of layers     for l in range(1, L + 1):         ### START CODE HERE ### (≈ 2 lines of code)         parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1])         parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))         ### END CODE HERE ###     return parameters

疑問:

If you have heard of "Xavier initialization", this is similar except Xavier initialization uses a scaling factor for the weightsW[l]W[l]of sqrt(1./layers_dims[l-1]) 實驗中提到的 Xarier 初始化 分佈為 正態分佈隨機化後 除以 “sqrt(上一層結點數目)” 然而在 論文中,Xavier 初始化 的分佈為均勻分佈:
in TensorFlow defxavier_init(fan_in,fan_out,constant = 1): low = -constant * np.sqrt(6.0/ (fan_in + fan_out)) high = constant * np.sqrt(6.0/ (fan_in + fan_out)) return tf.random_uniform((fan_in,fan_out)minval=low,maxval=high,dtype=tf.float32) [1] Xavier Glorot et al., Understanding the Difficult of Training Deep Feedforward Neural Networks.