deeplearning.ai-正向和反向傳播演算法公式

阿新 • • 發佈：2018-11-25

【正向和反向傳播】

【梯度下降i法】

【邏輯迴歸代價函式】

【實現神經網路的步驟】

【淺層神經網路例子】

import numpy as np


def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """

    ### START CODE HERE ### (≈ 1 line of code)
    s = 1 / (1 + np.exp(-x))
    ### END CODE HERE ###

    return s


def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)

    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
    ### START CODE HERE ### (≈ 3 lines of code)
    n_x = X.shape[0]  # size of input layer
    n_h = 4
    n_y = Y.shape[0]  # size of output layer
    ### END CODE HERE ###
    return (n_x, n_h, n_y)


def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer

    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """

    np.random.seed(2) # we set up a seed so that your output matches ours although the initialization is random.

    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = np.random.randn(n_h, n_x)* 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h)* 0.01
    b2 = np.zeros((n_y, 1))
    ### END CODE HERE ###

    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters


def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)

    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    ### END CODE HERE ###

    # Implement Forward Propagation to calculate A2 (probabilities)
    ### START CODE HERE ### (≈ 4 lines of code)
    Z1 = np.dot(W1 ,X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2 ,A1) + b2
    A2 = sigmoid(Z2)
    ### END CODE HERE ### assert(A2.shape == (1, X.shape[1]))

    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache


def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (13)

    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2

    Returns:
    cost -- cross-entropy cost given equation (13)
    """

    m = Y.shape[1] # number of example

    # Compute the cross-entropy cost
    ### START CODE HERE ### (≈ 2 lines of code)
    logprobs = np.multiply(np.log(A2) ,Y) + np.multiply(( 1 -Y), np.log( 1 -A2))
    #logprobs = np.multiply(np.log(A2), Y) + (1 - Y) * np.log(1 - A2)
    cost = - 1 / m *np.sum(logprobs)
    ### END CODE HERE ###

    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect.
    # E.g., turns [[17]] into 17  assert(isinstance(cost, float))

    return cost


def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation using the instructions above.

    Arguments:
    parameters -- python dictionary containing our parameters
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data of shape (2, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)

    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    m = X.shape[1]

    # First, retrieve W1 and W2 from the dictionary "parameters".
    ### START CODE HERE ### (≈ 2 lines of code)
    W1 = parameters['W1']
    W2 = parameters['W2']
    ### END CODE HERE ###

    # Retrieve also A1 and A2 from dictionary "cache".
    ### START CODE HERE ### (≈ 2 lines of code)
    A1 = cache['A1']
    A2 = cache['A2']
    ### END CODE HERE ###

    # Backward propagation: calculate dW1, db1, dW2, db2.
    ### START CODE HERE ### (≈ 6 lines of code, corresponding to 6 equations on slide above)
    dZ2 = A2 - Y
    dW2 = 1/ m * np.dot(dZ2, A1.T)
    db2 = 1 / m * np.sum(dZ2, axis=1, keepdims=True)
    dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))
    dW1 = 1 / m * np.dot(dZ1, X.T)
    db1 = 1 / m * np.sum(dZ1, axis=1, keepdims=True)
    ### END CODE HERE ###

    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}

    return grads


def update_parameters(parameters, grads, learning_rate = 1.2):
    """
    Updates parameters using the gradient descent update rule given above

    Arguments:
    parameters -- python dictionary containing your parameters
    grads -- python dictionary containing your gradients

    Returns:
    parameters -- python dictionary containing your updated parameters
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    ### END CODE HERE ###

    # Retrieve each gradient from the dictionary "grads"
    ### START CODE HERE ### (≈ 4 lines of code)
    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']
    ## END CODE HERE ###

    # Update rule for each parameter
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = W1 - learning_rate* dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2
    ### END CODE HERE ###

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters


def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 1000 iterations

    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """

    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]

    # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
    ### START CODE HERE ### (≈ 5 lines of code)
    parameters = initialize_parameters(n_x, n_h, n_y)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    ### END CODE HERE ###

    # Loop (gradient descent)

    for i in range(0, num_iterations):

        ### START CODE HERE ### (≈ 4 lines of code)
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X, parameters)

        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y, parameters)

        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters, cache, X, Y)

        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters, grads)

        ### END CODE HERE ###

        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters



def nn_model_test_case():
    X_assess = np.array([[1.62434536, - 0.61175641, - 0.52817175],
                [-1.07296862,  0.86540763, - 2.3015387]])
    Y_assess = np.array([[True, False, True]])
    return X_assess, Y_assess


def nn_mode_test():
    X_assess, Y_assess = nn_model_test_case()
    print("X_assess =", X_assess)
    print("Y_assess =", Y_assess)
    forward_propagation_test_case()
    parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=True)
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))


if __name__ == "__main__":
    nn_mode_test()

deeplearning.ai-正向和反向傳播演算法公式

【正向和反向傳播】【梯度下降i法】【邏輯迴歸代價函式】【實現神經網路的步驟】【淺層神經網路例子】 import numpy as np def sigmoid(x): """ Compute the sigmoid of x

（轉載）深度學習基礎（3）——神經網路和反向傳播演算法

原文地址：https://www.zybuluo.com/hanbingtao/note/476663 轉載在此的目的是自己做個筆記，日後好複習，如侵權請聯絡我！！　　在上一篇文章中，我們已經掌握了機器學習的基本套路，對模型、目標函式、優化演算法這些概念有了一定程度的理解，而且已經會訓練單個的感知器或者

吳恩達機器學習筆記-神經網路的代價函式和反向傳播演算法

代價函式在神經網路中，我們需要定義一些新的引數來表示代價函式。 L = total number of layers in the network $s_l$ = number of units (not counting bias unit) in layer

神經網路和反向傳播演算法推導

注：由於自己畫圖實在太難畫，本文中基本所有插圖來源於演算法糰子機器學習班，請勿轉載 1.普通的機器學習模型：其實，基本上所有的基本機器學習模型都可以概括為以下的特徵：根據某個函式，將輸入計算並輸出。圖形化表示為下圖：當我們的g(h)為sigmoid函式時候，它就是一個

DeepLearning-NLP-NN&RNN&LSTM正向傳播和反向傳播

DeepLearning-NLP-NN&RNN&LSTM正向傳播和反向傳播神經網路NN結構、傳播及修正神經網路結構圖數學公式描述經典網路中每一個神經元的工作如何反向傳播（BP（Backpropagation）神經網

基於時間的反向傳播演算法和梯度消失 -part3

本文翻譯自前文從零開始實現了RNN，但是沒有詳細介紹Backpropagation Through Time (BPTT) 演算法如何實現梯度計算。這篇文章將詳細介紹BPTT。之後會分析梯度消失問題，它導致了LSTM和GRU的發展，這是兩個在NLP領域最為流

反向傳播演算法（過程及公式推導）

反向傳播演算法（Backpropagation）是目前用來訓練人工神經網路（Artificial Neural Network，ANN）的最常用且最有效的演算法。其主要思想是：（1）將訓練集資料輸入到ANN的輸入層，經過隱藏層，最後達到輸出層並輸出結果，這是ANN的前向傳

BP（反向傳播）演算法和CNN反向傳播演算法推導（轉載）

轉載來源： http://blog.csdn.net/walegahaha/article/details/51867904 http://blog.csdn.net/walegahaha/article/details/51945421 關於CNN推導可以參考文獻:

BP神經網路，BP推導過程，反向傳播演算法，誤差反向傳播，梯度下降，權值閾值更新推導，隱含層權重更新公式

%% BP的主函式 % 清空 clear all; clc; % 匯入資料 load data; %從1到2000間隨機排序 k=rand(1,2000); [m,n]=sort(k); %輸入輸出資料 input=data(:,2:25); output1 =d

100天搞定機器學習|day37 無公式理解反向傳播演算法之精髓

100天搞定機器學習（Day1-34） 100天搞定機器學習|Day35 深度學習之神經網路的結構 100天搞定機器學習|Day36 深度學習之梯度下降演算法本篇為100天搞定機器學習之第37天，亦為3Blue1Brown《深度學習之反向傳播演算法》學習筆記。上集提到我們

caffe中的前向傳播和反向傳播

sla hit img 部分可能說明 caff .com 容易 caffe中的網絡結構是一層連著一層的，在相鄰的兩層中，可以認為前一層的輸出就是後一層的輸入，可以等效成如下的模型可以認為輸出top中的每個元素都是輸出bottom中所有元素的函數。如果兩個神經元之間沒

前項傳播和反向傳播

修正計算 ria 定義 original 基本而且是我 eight 前向傳播　　如圖所示，這裏講得已經很清楚了，前向傳播的思想比較簡單。　　舉個例子，假設上一層結點i,j,k,…等一些結點與本層的結點w有連接，那麽結點w的值怎麽算呢？就是通過上一層的i,j,

ORM正向和反向查詢

field 代碼正向查詢這一 har itl values ger you 表結構 from django.db import models # Create your models here. class Person(models.Model):

吳恩達機器學習（第十章）---神經網路的反向傳播演算法

一、簡介我們在執行梯度下降的時候，需要求得J(θ)的導數，反向傳播演算法就是求該導數的方法。正向傳播，是從輸入層從左向右傳播至輸出層；反向傳播就是從輸出層，算出誤差從右向左逐層計算誤差，注意：第一層不計算，因為第一層是輸入層，沒有誤差。二、如何計算設為第l層，第j個的誤差。

吳恩達機器學習 - 神經網路的反向傳播演算法吳恩達機器學習 - 神經網路的反向傳播演算法

原吳恩達機器學習 - 神經網路的反向傳播演算法 2018年06月21日 20:59:35 離殤灬孤狼閱讀數：373

BP反向傳播演算法

深度學習 --- BP演算法詳解（誤差反向傳播演算法）

本節開始深度學習的第一個演算法BP演算法，本打算第一個演算法為單層感知器，但是感覺太簡單了，不懂得找本書看看就會了，這裡簡要的介紹一下單層感知器：圖中可以看到，單層感知器很簡單，其實本質上他就是線性分類器，和機器學習中的多元線性迴歸的表示式差不多，因此它具有多元線性迴歸的優點和缺點。

通俗理解神經網路BP反向傳播演算法

轉載自通俗理解神經網路BP反向傳播演算法通俗理解神經網路BP反向傳播演算法在學習深度學習相關知識，無疑都是從神經網路開始入手，在神經網路對引數的學習演算法bp演算法，接觸了很多次，每一次查詢資料學習，都有著似懂非懂的感覺，這次趁著思路比較清楚，也為了能夠讓一些像

神經網路之梯度下降法和反向傳播BP

梯度下降法和反向傳播網上資料非常多，記錄點自己理解的 1.梯度下降法是為了使損失函式求最小，而梯度方向是函式增長最快的方向，前面加個負號就變成函式減少最快的方向：

caffe 前向傳播和反向傳播

caffe學習筆記3從3.1開始主要翻譯一下caffe的官方文件，寫的非常好，忍不住要作一下。本篇翻譯來自caffe官網的：http://caffe.berkeleyvision.org/tutorial/forward_backward.html 前向傳播和反向傳播是計算神經網路非常重要的部分

deeplearning.ai-正向和反向傳播演算法公式

相關推薦