吳恩達深度學習課程第二課第一週第一次作業：用神經網路簡單預測結果

阿新 • • 發佈：2019-01-02

# coding: utf-8

# # Initialization
# Welcome to the first assignment of "Improving Deep Neural Networks".
#
# Training your neural network requires specifying an initial value of the weights.
# A well chosen initialization method will help learning.
# If you completed the previous course of this specialization, you probably followed 

#  our instructions for weight initialization, and it has worked out so far. But how
# do you choose the initialization for a new neural network? In this notebook, you
# will see how different initializations lead to different results.
#
# A well chosen initialization can:
# - Speed up the convergence of gradient descent 

# - Increase the odds of gradient descent converging to a lower training (and generalization) error
# To get started, run the following cell to load the packages and the planar
# dataset you will try to classify.

# In[1]:different initializations lead to different results
#測試不同的引數會導致不同的結果

import numpy as 
 np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation
from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec

#get_ipython().magic('matplotlib inline')
plt.rcParams['figure.figsize'] = (7.0, 4.0)  # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# load image dataset: blue/red dots in circles
train_X, train_Y, test_X, test_Y = load_dataset()#讀取資料

# You would like a classifier to separate the blue dots from the red dots.

# ## 1 - Neural Network model

# You will use a 3-layer neural network (already implemented for you). Here are
# the initialization methods you will experiment with:
# - *Zeros initialization* --  setting `initialization = "zeros"` in the input argument.
# - *Random initialization* -- setting `initialization = "random"` in the input argument.
#  This initializes the weights to large random values.
# - *He initialization* -- setting `initialization = "he"` in the input argument.
# This initializes the weights to random values scaled according to a paper by He et al., 2015.
#
# **Instructions**: Please quickly read over the code below, and run it. In the next part
# you will implement the three initialization methods that this `model()` calls.

# In[2]:構建測試模型並進行測試
def model(X, Y, learning_rate=0.01, num_iterations=15000, print_cost=True, initialization="he"):
    """
    Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.

    Arguments:
    X -- input data, of shape (2, number of examples)
    Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples)
    learning_rate -- learning rate for gradient descent
    num_iterations -- number of iterations to run gradient descent
    print_cost -- if True, print the cost every 1000 iterations
    initialization -- flag to choose which initialization to use ("zeros","random" or "he")

    Returns:
    parameters -- parameters learnt by the model
    """

    grads = {}
    costs = []  # to keep track of the loss
    m = X.shape[1]  # number of examples
    layers_dims = [X.shape[0], 10, 5, 1]

    # Initialize parameters dictionary.
    if initialization == "zeros":
        parameters = initialize_parameters_zeros(layers_dims)
    elif initialization == "random":
        parameters = initialize_parameters_random(layers_dims)
    elif initialization == "he":
        parameters = initialize_parameters_he(layers_dims)

    # Loop (gradient descent)

    for i in range(0, num_iterations):

        # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.
        a3, cache = forward_propagation(X, parameters)

        # Loss計算損失
        cost = compute_loss(a3, Y)

        # Backward propagation.反向傳播
        grads = backward_propagation(X, Y, cache)

        # Update parameters.更新引數
        parameters = update_parameters(parameters, grads, learning_rate)

        # Print the loss every 1000 iterations
        if print_cost and i % 1000 == 0:
            print("Cost after iteration {}: {}".format(i, cost))
            costs.append(cost)
        """第一次值
        Cost after iteration 0: 0.6931471805599453
        Cost after iteration 1000: 0.6931471805599453
        Cost after iteration 2000: 0.6931471805599453
        Cost after iteration 3000: 0.6931471805599453
        Cost after iteration 4000: 0.6931471805599453
        Cost after iteration 5000: 0.6931471805599453
        Cost after iteration 6000: 0.6931471805599453
        Cost after iteration 7000: 0.6931471805599453
        Cost after iteration 8000: 0.6931471805599453
        Cost after iteration 9000: 0.6931471805599453
        Cost after iteration 10000: 0.6931471805599455
        Cost after iteration 11000: 0.6931471805599453
        Cost after iteration 12000: 0.6931471805599453
        Cost after iteration 13000: 0.6931471805599453
        Cost after iteration 14000: 0.6931471805599453
        """
    # plot the loss
    plt.plot(costs)
    plt.ylabel('cost')
    plt.xlabel('iterations (per hundreds)')
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()

    return parameters


# ## 2 - Zero initialization
#
# There are two types of parameters to initialize in a neural network:
# - the weight matrices (W[1], W[2], W[3], ..., W[L-1]}, W[L])
# - the bias vectors (b[1], b[2], b[3], ..., b[L-1], b[L])
#
# **Exercise**: Implement the following function to initialize all parameters to zeros.
# You'll see later that this does not work well since it fails to "break symmetry", but
# lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.

# In[3]:初始化引數為0的

# GRADED FUNCTION: initialize_parameters_zeros

def initialize_parameters_zeros(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """
    parameters = {}
    L = len(layers_dims)  # number of layers in the network

    for l in range(1, L):
        ### START CODE HERE ### (≈ 2 lines of code)
        parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l - 1]))
        parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
    ### END CODE HERE ###
    return parameters

# In[4]:輸出初始化後的結果

parameters = initialize_parameters_zeros([3, 2, 1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

"""第一次值
W1 = [[ 0.  0.  0.]
 [ 0.  0.  0.]]
b1 = [[ 0.]
 [ 0.]]
W2 = [[ 0.  0.]]
b2 = [[ 0.]]
"""
# **Expected Output**:
# Run the following code to train your model on 15,000 iterations using zeros initialization.

# In[5]:根據模型預測結果

parameters = model(train_X, train_Y, initialization="zeros")
print("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

# The performance is really bad, and the cost does not really decrease, and the algorithm
# performs no better than random guessing. Why? Lets look at the details of the predictions
# and the decision boundary:

# In[6]:輸出預測的結果
print("predictions_train = " + str(predictions_train))
print("predictions_test = " + str(predictions_test))

# In[7]:用圖形顯示最終的結果
plt.title("Model with Zeros initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
plt.show()

# The model is predicting 0 for every example.
#
# In general, initializing all the weights to zero results in the network failing to break
#  symmetry. This means that every neuron in each layer will learn the same thing, and you
# might as well be training a neural network with n[l]=1 for every layer, and the network
# is no more powerful than a linear classifier such as logistic regression.

# **What you should remember**:
# - The weights W[l] should be initialized randomly to break symmetry.
# - It is however okay to initialize the biases b[l] to zeros. Symmetry is still broken
#  so long as W[l] is initialized randomly.
#

# ## 3 - Random initialization
#
# To break symmetry, lets intialize the weights randomly. Following random initialization,
# each neuron can then proceed to learn a different function of its inputs. In this exercise,
# you will see what happens if the weights are intialized randomly, but to very large values.
#
# **Exercise**: Implement the following function to initialize your weights to large random
# values (scaled by *10) and your biases to zeros. Use `np.random.randn(..,..) * 10 for
# weights and `np.zeros((.., ..))` for biases. We are using a fixed `np.random.seed(..)` to make
# sure your "random" weights  match ours, so don't worry if running several times your code gives
# you always the same initial values for the parameters.

# In[8]:初始化隨機賦值
# GRADED FUNCTION: initialize_parameters_random

def initialize_parameters_random(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """

    np.random.seed(3)  # This seed makes sure your "random" numbers will be the as ours
    parameters = {}
    L = len(layers_dims)  # integer representing the number of layers

    for l in range(1, L):
        ### START CODE HERE ### (≈ 2 lines of code)
        #parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * 10
        parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1])*10
        parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
    ### END CODE HERE ###

    return parameters

# In[9]:輸出隨機變數各個引數的值

parameters = initialize_parameters_random([3, 2, 1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

# Run the following code to train your model on 15,000 iterations using random initialization.

# In[10]:用隨機變數預測模型

parameters = model(train_X, train_Y, initialization="random")
print("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

# If you see "inf" as the cost after the iteration 0, this is because of numerical roundoff;
# a more numerically sophisticated implementation would fix this. But this isn't worth worrying
#  about for our purposes.
#
# Anyway, it looks like you have broken symmetry, and this gives better results. than before.
# The model is no longer outputting all 0s.

# In[11]:輸出隨機變數的結果
print("predictions_train:")
print(predictions_train)
print("predictions_test:")
print(predictions_test)

# In[12]:用圖形顯示隨機變數預測的結果

plt.title("Model with large random initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
plt.show()

# **Observations**:
# - The cost starts very high. This is because with large random-valued weights, the last
# activation (sigmoid) outputs results that are very close to 0 or 1 for some examples,
# and when it gets that example wrong it incurs a very high loss for that example. Indeed,
#  when log(a[3]) = log(0), the loss goes to infinity.
# - Poor initialization can lead to vanishing/exploding gradients, which also slows down
# the optimization algorithm.
# - If you train this network longer you will see better results, but initializing with
# overly large random numbers slows down the optimization.
#

# **In summary**:
# - Initializing weights to very large random values does not work well.
# - Hopefully intializing with small random values does better. The important question is: how
# small should be these random values be? Lets find out in the next part!

# ## 4 - He initialization
#
# Finally, try "He Initialization"; this is named for the first author of He et al., 2015.
# (If you have heard of "Xavier initialization", this is similar except Xavier initialization
# uses a scaling factor for the weights W[l] of sqrt(1./layers_dims[l-1]) where He
# initialization would use sqrt(2./layers_dims[l-1]))
#
# **Exercise**: Implement the following function to initialize your parameters with He initialization.
#
# **Hint**: This function is similar to the previous `initialize_parameters_random(...)`. The only
# difference is that instead of multiplying `np.random.randn(..,..)` by 10, you will multiply it
# by sqrt(2/(dimension of the previous layer)), which is what He initialization recommends for
# layers with a ReLU activation.

# In[13]:用initialize_parameters_he預測結果

# GRADED FUNCTION: initialize_parameters_he

def initialize_parameters_he(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """

    np.random.seed(3)
    parameters = {}
    L = len(layers_dims) - 1  # integer representing the number of layers

    for l in range(1, L + 1):
        ### START CODE HERE ### (≈ 2 lines of code)
        parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * np.sqrt(
            2. / layers_dims[l - 1])
        parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
    ### END CODE HERE ###

    return parameters

# In[14]:輸出用initialize_parameters_he處理後的結果

parameters = initialize_parameters_he([2, 4, 1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

# Run the following code to train your model on 15,000 iterations using He initialization.

# In[15]:訓練模型，並輸出預測結果
parameters = model(train_X, train_Y, initialization="he")
print("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

"""第一次值
On the train set:
Accuracy: 0.5
On the test set:
Accuracy: 0.5
"""
# In[16]:繪製initialize_parameters_he處理後的結果

plt.title("Model with He initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
plt.show()
# **Observations**:
# - The model with He initialization separates the blue and the red dots very well
# in a small number of iterations.
#

# ## 5 - Conclusions

# You have seen three different types of initializations. For the same number of iterations
# and same hyperparameters the comparison is:

# **What you should remember from this notebook**:
# - Different initializations lead to different results
# - Random initialization is used to break symmetry and make sure different hidden units
# can learn different things
# - Don't intialize to values that are too large
# - He initialization works well for networks with ReLU activations.

# In[17]:最終結果：the result of Accuracy is zero<random<initialize_parameters_he

吳恩達深度學習課程第二課第一週第一次作業：用神經網路簡單預測結果

# coding: utf-8 # # Initialization # Welcome to the first assignment of "Improving Deep Neural Networks". # # Training your neural

吳恩達深度學習課程第一課第二週課程作業

學過吳恩達的Machine Learning課程，現在跟著學深度學習，本來是想付費的，奈何搞半天付款沒有成功，沒辦法只能下載資料集自己搞了。由於門外漢，安裝工具軟體加上完成作業花了一天時間，其實第二週的作業和機器學習課程基本是一樣的，沒有什麼太大難度，都是初級入

吳恩達-深度學習-課程筆記-3: Python和向量化( Week 2 )

有時指數檢查都是效果很快 -1 tro str 1 向量化( Vectorization ) 在邏輯回歸中，以計算z為例，z = w的轉置和x進行內積運算再加上b，你可以用for循環來實現。但是在python中z可以調用numpy的方法，直接一句z = np.d

吳恩達-深度學習-課程筆記-6: 深度學習的實用層面( Week 1 )

data 絕對值 initial 均值化 http 梯度下降法 ati lod 表示 1 訓練/驗證/測試集( Train/Dev/test sets ) 構建神經網絡的時候有些參數需要選擇，比如層數，單元數，學習率，激活函數。這些參數可以通過在驗證集上的表現好壞來進行選擇

吳恩達-深度學習-課程筆記-8: 超參數調試、Batch正則化和softmax( Week 3 )

erp 搜索給定 via 深度 mode any .com sim 1 調試處理( tuning process ) 如下圖所示，ng認為學習速率α是需要調試的最重要的超參數。其次重要的是momentum算法的β參數（一般設為0.9），隱藏單元數和mini-batch的

Elam的吳恩達深度學習課程筆記（一）

記憶力是真的差，看過的東西要是一直不用的話就會馬上忘記,於是乎有了寫部落格把學過的東西儲存下來，大概就是所謂的集鞏固，分享，後期查閱與一身的思想吧，下面開始正題深度學習概論什麼是神經網路什麼是神經網路呢，我們就以房價預測為例子來描述一個最簡單的神經網路模型。　　假設有6間

優化演算法（吳恩達深度學習課程）-- 2018.11.02筆記

優化演算法（吳恩達深度學習課程） batch梯度下降使用batch梯度下降時，每次迭代你都需要遍歷整個訓練集，可以預期每次成本都會下降，所以如果成本函式

吳恩達深度學習課程學習總結

本文章主要總結吳恩達DeepLearning課程中所提到的一些機器學習策略 1、啟用函式 Sigmoid函式 tanh函式 ReLU函式 2、權重初始化全零初始化的弊端：若權重初始化為0，則在訓練過程中，每個隱含層之間存在對稱性，即在訓練過程中，每個隱含層的求導等

吳恩達深度學習課程deeplearning.ai課程作業：Class 1 Week 4 assignment4_2

吳恩達deeplearning.ai課程作業，自己寫的答案。補充說明： 1. 評論中總有人問為什麼直接複製這些notebook執行不了？請不要直接複製貼上，不可能執行通過的，這個只是notebook中我們要自己寫的那部分，要正確執行還需要其他py檔案，請

吳恩達深度學習課程deeplearning.ai課程作業：Class 4 Week 3 Car detection

吳恩達deeplearning.ai課程作業，自己寫的答案。補充說明： 1. 評論中總有人問為什麼直接複製這些notebook執行不了？請不要直接複製貼上，不可能執行通過的，這個只是notebook中我們要自己寫的那部分，要正確執行還需要其他py檔案，

http://www.52nlp.cn/tag/tensorflow Andrew Ng (吳恩達) 深度學習課程小結 tensorflow

Dear Learners, We hope that you are enjoying Structuring Machine Learning Projects and your experience in the Deep Learning Specialization so far!

吳恩達深度學習課程2018開放 (Stanford CS230)

歡迎點選參觀我的 ——> 個人學習網站 Stanford 2018 春季 CS230 (深度學習)課程資料開放，授課老師是吳恩達。課程介紹從官網課程介紹，這次課程和去年的課程差別不算太大，仍然包括 CNNs, RNNs, LSTM, Ad

吳恩達深度學習第四課：卷積神經網路（學習筆記2）

前言 1.之所以堅持記錄，是因為看到其他人寫的優秀部落格，內容準確詳實，思路清晰流暢，這也說明了作者對知識的深入思考。我也希望能儘量將筆記寫的準確、簡潔，方便自己回憶也方便別人參考； 2.昨天看到兩篇關於計算機視覺的發展介紹的文章：[觀點|朱鬆純：初探計算機

吳恩達深度學習課程deeplearning.ai課程作業：Class 4 Week 2 Residual Networks

吳恩達深度學習課程deeplearning.ai課程作業：Class 4 Week 2 Keras

【吳恩達深度學習課程】第一週課後習題

先說明一下，以下答案均為個人見解，如有不同想法歡迎交流。另外轉載請註明出處，謝謝1.試分析‘AI是新能量’指的是：A.等價於100年前電的出現一樣，AI正在轉變大多數的產業B.AI和電一樣在生活和辦公中發揮著重要作用C.通過“智慧電力”，AI正在傳送一種電力的新浪潮D.AI執

吳恩達深度學習課程筆記之卷積神經網路基本操作詳解

卷積層 CNN中卷積層的作用： CNN中的卷積層，在很多網路結構中會用conv來表示，也就是convolution的縮寫。卷積層在CNN中扮演著很重要的角色——特徵的抽象和提取，這也是CNN區別於傳統的ANN或SVM的重要不同。對於圖片而

吳恩達深度學習課程第二課第一週第一次作業：用神經網路簡單預測結果

吳恩達深度學習課程第二課第一週第一次作業：用神經網路簡單預測結果

吳恩達深度學習課程第一課第二週課程作業

吳恩達-深度學習-課程筆記-3: Python和向量化( Week 2 )

吳恩達-深度學習-課程筆記-6: 深度學習的實用層面( Week 1 )

吳恩達-深度學習-課程筆記-8: 超參數調試、Batch正則化和softmax( Week 3 )

Elam的吳恩達深度學習課程筆記（一）

優化演算法（吳恩達深度學習課程）-- 2018.11.02筆記

吳恩達深度學習課程學習總結

吳恩達深度學習課程deeplearning.ai課程作業：Class 1 Week 4 assignment4_2

吳恩達深度學習課程deeplearning.ai課程作業：Class 4 Week 3 Car detection

http://www.52nlp.cn/tag/tensorflow Andrew Ng (吳恩達) 深度學習課程小結 tensorflow

吳恩達深度學習課程2018開放 (Stanford CS230)

吳恩達深度學習第四課：卷積神經網路（學習筆記2）

吳恩達深度學習課程deeplearning.ai課程作業：Class 4 Week 2 Residual Networks

吳恩達深度學習課程deeplearning.ai課程作業：Class 4 Week 2 Keras

【吳恩達深度學習課程】第一週課後習題

吳恩達深度學習課程筆記之卷積神經網路基本操作詳解

吳恩達深度學習課程deeplearning.ai課程作業：Class 1 Week 3 assignment3

吳恩達深度學習課程deeplearning.ai課程作業：Class 2 Week 3 TensorFlow Tutorial

吳恩達深度學習課程deeplearning.ai課程作業：Class 1 Week 4 assignment4_1

吳恩達深度學習課程第二課第一週第一次作業：用神經網路簡單預測結果

相關推薦