1. 程式人生 > >改善深層神經網路第一週-Initialization

改善深層神經網路第一週-Initialization

Initialization

Welcome to the first assignment of “Improving Deep Neural Networks”.

Training your neural network requires specifying an initial value of the weights. A well chosen initialization method will help learning. (訓練你的神經網路需要指定權重的初始值。精心挑選的初始化方法將有助於學習。)

If you completed the previous course of this specialization, you probably followed our instructions for weight initialization, and it has worked out so far. But how do you choose the initialization for a new neural network? In this notebook, you will see how different initializations lead to different results. (如果你完成了這個專業化的前一個課程,你可能按照我們的指導進行體重初始化,到目前為止它已經成功。但是,你如何選擇一個新的神經網路的初始化?在這個筆記本中,你會看到不同的初始化會帶來不同的結果。)

A well chosen initialization can: 
- Speed up the convergence of gradient descent(加快梯度下降的趨勢) 
- Increase the odds of gradient descent converging to a lower training (and generalization) error (增加梯度下降收斂到較低的訓練(和泛化)錯誤的機率)

To get started, run the following cell to load the packages and the planar dataset you will try to classify.

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation
from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec
#%matplotlib inline
plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray' # load image dataset: blue/red dots in circles train_X, train_Y, test_X, test_Y = load_dataset() plt.show()

     

1 - Neural Network model

You will use a 3-layer neural network (already implemented for you). Here are the initialization methods you will experiment with: 
- Zeros initialization – setting initialization = "zeros" in the input argument. 
- Random initialization – setting initialization = "random" in the input argument. This initializes the weights to large random values. (這將權重初始化為大的隨機值) 
- He initialization – setting initialization = "he" in the input argument. This initializes the weights to random values scaled according to a paper by He et al., 2015. (這將權重初始化為根據He等人,2015年的論文縮放的隨機值)

Instructions: Please quickly read over the code below, and run it. In the next part you will implement the three initialization methods that this model() calls.

2 - Zero initialization

Exercise: Implement the following function to initialize all parameters to zeros. You’ll see later that this does not work well since it fails to “break symmetry”, but lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.(實現以下功能將所有引數初始化為零。稍後你會看到,這不能很好地工作,因為它不能“破壞對稱性”,而是讓我們嘗試一下,看看會發生什麼。使用正確形狀的np.zeros((..,..)))

def initialize_parameters_zeros(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """
parameters = {}
    L = len(layers_dims)            # number of layers in the network
for l in range(1, L):
        ### START CODE HERE ### (≈ 2 lines of code)
parameters['W' + str(l)] = np.zeros((layers_dims[l],layers_dims[l-1]))
        parameters['b' + str(l)] = np.zeros((layers_dims[l],1))
        ### END CODE HERE ###
return parameters
if __name__ == '__main__':
    model(train_X,train_Y,initialization='zeros')
  parameters = initialize_parameters_zeros([3, 2, 1])
  print("W1 = " + str(parameters["W1"]))
  print("b1 = " + str(parameters["b1"]))
  print("W2 = " + str(parameters["W2"]))
  print("b2 = " + str(parameters["b2"]))

結果:

W1 = [[0. 0. 0.]
      [0. 0. 0.]]
b1 = [[0.]
      [0.]]
W2 = [[0. 0.]]
b2 = [[0.]]

Run the following code to train your model on 15,000 iterations using zeros initialization.

parameters = model(train_X, train_Y, initialization="zeros")
print("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

結果:

Cost after iteration 0: 0.6931471805599453
Cost after iteration 1000: 0.6931471805599453
Cost after iteration 2000: 0.6931471805599453
Cost after iteration 3000: 0.6931471805599453
Cost after iteration 4000: 0.6931471805599453
Cost after iteration 5000: 0.6931471805599453
Cost after iteration 6000: 0.6931471805599453
Cost after iteration 7000: 0.6931471805599453
Cost after iteration 8000: 0.6931471805599453
Cost after iteration 9000: 0.6931471805599453
Cost after iteration 10000: 0.6931471805599455
Cost after iteration 11000: 0.6931471805599453
Cost after iteration 12000: 0.6931471805599453
Cost after iteration 13000: 0.6931471805599453
Cost after iteration 14000: 0.6931471805599453
On the train set:
Accuracy: 0.5
On the test set:

Accuracy: 0.5

The performance is really bad, and the cost does not really decrease, and the algorithm performs no better than random guessing. Why? Lets look at the details of the predictions and the decision boundary(效能非常糟糕,成本並沒有真正降低,演算法也沒有比隨機猜測更好。為什麼?讓我們看看預測和決策邊界的細節):

print("predictions_train = " + str(predictions_train))
print("predictions_test = " + str(predictions_test))

結果:

predictions_train = [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                      0 0 0 0 0 0 0 0 0 0 0 0]]
predictions_test = [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
                     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, np.squeeze(train_Y))

                                

The model is predicting 0 for every example.

In general, initializing all the weights to zero results in the network failing to break symmetry. This means that every neuron in each layer will learn the same thing, and you might as well be training a neural network with n[l]=1 for every layer, and the network is no more powerful than a linear classifier such as logistic regression. (一般來說,將所有權重初始化為零將導致網路無法破壞對稱性。這意味著每一層中的每一個神經元都會學到相同的東西,而且你也可以用每一層的n[l]=1來訓練一個神經網路,並且網路沒有線性強大分類器如邏輯迴歸)

What you should remember
- The weights W[l] should be initialized randomly to break symmetry. (應該隨機地初始化權重W[l]以破壞對稱性
- It is however okay to initialize the biases b[l] to zeros. Symmetry is still broken so long as W[l] is initialized randomly. (然而,將偏置b[l]初始化為零是可以的。只要W[l]

被隨機初始化,對稱性仍然被打破)

3 - Random initialization

To break symmetry, lets intialize the weights randomly. Following random initialization, each neuron can then proceed to learn a different function of its inputs. In this exercise, you will see what happens if the weights are intialized randomly, but to very large values. (為了打破對稱,讓隨機初始化權重。隨機初始化之後,每個神經元可以繼續學習其輸入的不同功能。在這個練習中,你會看到如果權重是隨機初始化會發生什麼,但是會發生什麼)

Exercise: Implement the following function to initialize your weights to large random values (scaled by *10) and your biases to zeros. Use np.random.randn(..,..) * 10 for weights and np.zeros((.., ..)) for biases. We are using a fixed np.random.seed(..) to make sure your “random” weights match ours, so don’t worry if running several times your code gives you always the same initial values for the parameters. (實現以下功能,將您的權重初始化為較大的隨機值(由*10縮放),並將偏差初始化為零。使用np.random.randn(..,..)* 10作為權重和np.zeros((..,..))偏差。我們正在使用一個固定的np.random.seed(..)來確保你的“隨機”權重符合我們的要求,所以不用擔心,如果執行幾次你的程式碼給你的引數總是相同的初始值。)

def initialize_parameters_random(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """
np.random.seed(3)               # This seed makes sure your "random" numbers will be the as ours
parameters = {}
    L = len(layers_dims)            # integer representing the number of layers
for l in range(1, L):
        ### START CODE HERE ### (≈ 2 lines of code)
parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1])*10
parameters['b' + str(l)] = np.zeros((layers_dims[l],1))
        ### END CODE HERE ###
return parameters
if __name__ == '__main__':
    parameters = initialize_parameters_random([3, 2, 1])
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

執行結果:

W1 = [[17.88628473   4.36509851   0.96497468]
      [-18.63492703 - 2.77388203 - 3.54758979]]
b1 = [[0.]
      [0.]]
W2 = [[-0.82741481 - 6.27000677]]
b2 = [[0.]]

Run the following code to train your model on 15,000 iterations using random initialization.

parameters = model(train_X, train_Y, initialization="random")
print("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)
執行結果:
Cost after iteration 0: inf
Cost after iteration 1000: 0.6247924745506072
Cost after iteration 2000: 0.5980258056061102
Cost after iteration 3000: 0.5637539062842213
Cost after iteration 4000: 0.5501256393526495
Cost after iteration 5000: 0.5443826306793814
Cost after iteration 6000: 0.5373895855049121
Cost after iteration 7000: 0.47157999220550006
Cost after iteration 8000: 0.39770475516243037
Cost after iteration 9000: 0.3934560146692851
Cost after iteration 10000: 0.3920227137490125
Cost after iteration 11000: 0.38913700035966736
Cost after iteration 12000: 0.3861358766546214
Cost after iteration 13000: 0.38497629552893475
Cost after iteration 14000: 0.38276694641706693


.

On the train set:
Accuracy: 0.83
On the test set:
Accuracy: 0.86

If you see “inf” as the cost after the iteration 0, this is because of numerical roundoff; a more numerically sophisticated implementation would fix this. But this isn’t worth worrying about for our purposes. (如果您看到“inf”作為迭代0之後的成本,這是因為數值舍入;一個更復雜的實現可以解決這個問題。但是這不值得為我們的目的而擔心。)

Anyway, it looks like you have broken symmetry, and this gives better results. than before. The model is no longer outputting all 0s. (無論如何,看起來你已經破壞了對稱性,而這樣做會有更好的結果。比以前。該模型不再輸出全0。)

[[1 0 1 1 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 1 1 1 1 1 0 1 1 0 0 1
  1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 1 1 0
  0 0 0 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 0
  1 0 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0
  0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1
  1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 1 0 0 1
  0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 0 1 1
  1 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1
  1 1 1 1 0 0 0 1 1 1 1 0]]
[[1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 0 0 1 0 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 0 1
  0 1 1 0 0 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0
  1 1 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0]]
plt.title("Model with large random initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, np.squeeze(train_Y))

Observations

- The cost starts very high. This is because with large random-valued weights, the last activation (sigmoid) outputs results that are very close to 0 or 1 for some examples, and when it gets that example wrong it incurs a very high loss for that example. Indeed, when log(a[3])=log(0), the loss goes to infinity.(成本開始非常高。這是因為對於大的隨機值權重,最後一次啟用(sigmoid)輸出的結果非常接近於0或1,並且在得到這個例子的錯誤時,這個例子會導致非常高的損失。事實上,當log(a[3])=log(0)時,損失會變成無窮大
- Poor initialization can lead to vanishing/exploding gradients, which also slows down the optimization algorithm.(初始化不良會導致漸變/爆炸漸變,這也會減慢優化演算法的速度

- If you train this network longer you will see better results, but initializing with overly large random numbers slows down the optimization.(如果你長時間訓練這個網路,你會看到更好的結果,但是用過大的隨機數初始化會減慢優化速度)

4 - He initialization

In summary
- Initializing weights to very large random values does not work well. (將權重初始化為非常大的隨機值效果不佳
- Hopefully intializing with small random values does better. The important question is: how small should be these random values be? Lets find out in the next part! (希望用小的隨機值進行初始化會更好。重要的問題是這些隨機值應該小到多少?讓我們來看看下一部分!)

Finally, try “He Initialization”; this is named for the first author of He et al., 2015. (If you have heard of “Xavier initialization”, this is similar except Xavier initialization uses a scaling factor for the weights W[l

相關推薦

改善深層神經網路第一-Initialization

InitializationWelcome to the first assignment of “Improving Deep Neural Networks”.Training your neural network requires specifying an init

改善深層神經網路第一-Regularization(正則化)

RegularizationWelcome to the second assignment of this week. Deep Learning models have so much flexibility and capacity that overfitting c

改善深層神經網路第一

1.1 訓練/開發/測試集 可以將一個數據集分為三個部分, 訓練集:對訓練集執行訓練演算法,通過驗證集或簡單交叉驗證集選擇最好的模型. 驗證集:驗證不同的演算法的有效性進而選擇最終的模型,然後就可以在測試集上進行評估了. 測試集(test):對驗證集最終選

改善深層深度網路 第一 1

InitializationWelcome to the first assignment of "Improving Deep Neural Networks".Training your neural network requires specifying an initial value of the

改善深層神經網路第二

2.1 mini-batch梯度下降法 小批量梯度下降法是批量梯度下降法和隨機梯度下降法的折衷,也就是對於m個樣本,我們採用x個樣子來迭代, 2.2 指數加權平均 2.3 指數

改善深層深度網路 第一 2

Observations:The value of λλ is a hyperparameter that you can tune using a dev set.L2 regularization makes your decision boundary smoother. If λλ is too la

02改善深層神經網路-Initialization-第一程式設計作業1

分別使用 全零:parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l-1])) 隨機:parameters['W' + str(l)] = np.random.randn(layers_dim

改善深層神經網路:超引數除錯、正則化以及優化_課程筆記_第一、二、三

所插入圖片仍然來源於吳恩達老師相關視訊課件。仍然記錄一下一些讓自己思考和關注的地方。 第一週 訓練集與正則化 這周的主要內容為如何配置訓練集、驗證集和測試集;如何處理偏差與方差;降低方差的方法(增加資料量、正則化:L2、dropout等);提升訓練速度的方法:歸一化訓練集;如何合理的初始化權

吳恩達 改善深層神經網路:超引數除錯、正則化以及優化 第一

吳恩達 改善深層神經網路:超引數除錯、正則化以及優化 課程筆記  第一週 深度學習裡面的實用層面 1.1 測試集/訓練集/開發集        原始的機器學習裡面訓練集,測試集和開發集一般按照6:2:2的比例來進行劃分。但是傳統的機器學習

吳恩達deeplearning.ai課程《改善深層神經網路:超引數除錯、正則化以及優化》____學習筆記(第一

____tz_zs學習筆記第一週 深度學習的實用層面(Practical aspects of Deep Learning)我們將學習如何有效運作神經網路(超引數調優、如何構建資料以及如何確保優化演算法快速執行)設定ML應用(Setting up your ML applic

改善深層神經網路:超引數除錯、正則化以及優化 優化演算法 第二

改善深層神經網路:超引數除錯、正則化以及優化  優化演算法 第二課 1. Mini-batch Batch vs Mini-batch gradient descent Batch就是將所有的訓練資料都放到網路裡面進行訓練,計算量大,硬體要求高。一次訓練只能得到一個梯

DeepLearing學習筆記-改善深層神經網路(第三作業-TensorFlow使用)

0- 背景: 採用TensorFlow的框架進行神經網路構建和結果預測 1- 環境依賴: import math import numpy as np import h5py import matplotlib.pyplot as plt import

改善深層神經網路_優化演算法_mini-batch梯度下降、指數加權平均、動量梯度下降、RMSprop、Adam優化、學習率衰減

1.mini-batch梯度下降 在前面學習向量化時,知道了可以將訓練樣本橫向堆疊,形成一個輸入矩陣和對應的輸出矩陣: 當資料量不是太大時,這樣做當然會充分利用向量化的優點,一次訓練中就可以將所有訓練樣本涵蓋,速度也會較快。但當資料量急劇增大,達到百萬甚至更大的數量級時,組成的矩陣將極其龐大,直接對這麼大

吳恩達改善深層神經網路引數:超引數除錯、正則化以及優化——優化演算法

機器學習的應用是一個高度依賴經驗的過程,伴隨著大量的迭代過程,你需要訓練大量的模型才能找到合適的那個,優化演算法能夠幫助你快速訓練模型。 難點:機器學習沒有在大資料發揮最大的作用,我們可以利用巨大的資料集來訓練網路,但是在大資料下訓練網路速度很慢; 使用快速的優化演算法大大提高效率

改善深層神經網路——超引數除錯、Batch正則化和程式框架(7)

目錄 1.超引數除錯 深度神經網路需要除錯的超引數(Hyperparameters)較多,包括: α:學習因子 β:動量梯度下降因子 :Adam演算法引數 #layers:神經網路層數

改善深層神經網路——優化演算法(6)

目錄 1.Mini-batch gradient descent 前我們介紹的神經網路訓練過程是對所有m個樣本,稱為batch,通過向量化計算方式,同時進行的。如果m很大,例如達到百萬數量級,訓練速度往往會很慢,因為每次迭代都要對所

改善深層神經網路——深度學習的實用層面(5)

目錄 正則化 偏差大的解決辦法:在正則化引數合適的情況下增大網路(不影響方差) 方差大解決辦法:調整正則化引數或者準備更多資料增大資料集(不影響偏差) 正則化 邏輯迴歸正則化: 神經網路正則化: 6.Dropout正則化 除

改善深層神經網路week1學習筆記

1.Initialization Zero initialization 即初始化為0。如果將w權重矩陣和b偏置矩陣都使用Zero initialization則會產生以下的結果: 演算法對每個測試樣裡都會輸出0,因此準確率在0.5左右,與隨機猜測無異。總體來說,因

吳恩達《深度學習-改善深層神經網路》3--超引數除錯、正則化以及優化

1. 系統組織超參除錯Tuning process1)深度神經網路的超參有學習速率、層數、隱藏層單元數、mini-batch大小、學習速率衰減、β(優化演算法)等。其重要性各不相同,按重要性分類的話:   第一類:最重要的引數就是學習速率α    第二類:隱藏層單元數、min

《吳恩達深度學習工程師系列課程之——改善深層神經網路:超引數除錯、正則化以及優化》學習筆記

本課程分為三週內容: 深度學習的使用層面 優化演算法 超引數除錯、Batch正則化和程式框架 WEEK1 深度學習的使用層面 1.建立神經網路時選擇: 神經網路層數 每層隱藏單元的個數 學習率為多少 各層採用的啟用函式為哪些 2