1. 程式人生 > >DeepLearning.ai作業:(1-4)-- 深層神經網路(Deep neural networks)

DeepLearning.ai作業:(1-4)-- 深層神經網路(Deep neural networks)

  1. 不要抄作業!
  2. 我只是把思路整理了,供個人學習。
  3. 不要抄作業!

本週的作業分了兩個部分,第一部分先構建神經網路的基本函式,第二部分才是構建出模型並預測。

Part1

構建的函式有:

  • Initialize the parameters
    • two-layer
    • L-layer
  • forworad propagation
    • Linear part 先構建一個線性的計算函式
    • linear->activation 在構建某一個神經元的線性和啟用函式
    • L_model_forward funciton 再融合 L-1次的Relu 和 一次 的 sigmoid最後一層
  • Compute loss
  • backward propagation
    • Linear part
    • linear->activation
    • L_model_backward funciton

Initialization

初始化使用:

w : np.random.randn(shape)*0.01

b : np.zeros(shape)

1. two-layer

先寫了個兩層的初始化函式,上週已經寫過了。

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer

    Returns:
    parameters -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
np.random.seed(1) ### START CODE HERE ### (≈ 4 lines of code) W1 = np.random.randn(n_h, n_x) * 0.01 b1 = np.zeros((n_h,1)) W2 = np.random.randn(n_y, n_h) * 0.01 b2 = np.zeros((n_y,1)) ### END CODE HERE ### assert(W1.shape == (n_h, n_x)) assert(b1.shape == (n_h, 1)) assert
(W2.shape == (n_y, n_h)) assert(b2.shape == (n_y, 1)) parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters

2. L-layer

然後寫了個L層的初始化函式,其中,輸入的引數是一個列表,如[12,4,3,1],表示一共4層:

def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """

    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)            # number of layers in the network

    for l in range(1, L):
        ### START CODE HERE ### (≈ 2 lines of code)
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
        ### END CODE HERE ###

        assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
        assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))


    return parameters

Forward propagation module

1. Linear Forward

利用公式:

Z[l]=W[l]A[l1]+b[l]

where A[0]=X.

這個時候,輸入的引數是 A,W,b,輸出是計算得到的Z,以及cache=(A, W, b)儲存起來

def linear_forward(A, W, b):
    """
    Implement the linear part of a layer's forward propagation.

    Arguments:
    A -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)

    Returns:
    Z -- the input of the activation function, also called pre-activation parameter 
    cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
    """

    ### START CODE HERE ### (≈ 1 line of code)
    Z = np.dot(W, A) + b
    ### END CODE HERE ###

    assert(Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)

    return Z, cache

2. Linear-Activation Forward

在這裡就是把剛才得到的Z,通過A=g(Z)啟用函式,合併成一個

這個時候,notebook已經給了我們現成的sigmoid和relu函數了,只要呼叫就行,不過在裡面好像沒有說明原始碼,輸出都是A和cache=Z,這裡貼出來:

def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy

    Arguments:
    Z -- numpy array of any shape

    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """

    A = 1/(1+np.exp(-Z))
    cache = Z

    return A, cache
def relu(Z):
    """
    Implement the RELU function.

    Arguments:
    Z -- Output of the linear layer, of any shape

    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """

    A = np.maximum(0,Z)

    assert(A.shape == Z.shape)

    cache = Z 
    return A, cache

而後利用之前的linear_forward,可以寫出某層神經元的前向函數了,輸入是A[l1],W,b,還有一個是說明sigmoid還是relu的字串activation。

輸出是A[l]和cache,這裡的cache已經包含的4個引數了,分別是A[l1],W[l],b[l],Z[l]


# GRADED FUNCTION: linear_activation_forward

def linear_activation_forward(A_prev, W, b, activation):
    """
    Implement the forward propagation for the LINEAR->ACTIVATION layer

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    A -- the output of the activation function, also called the post-activation value 
    cache -- a python dictionary containing "linear_cache" and "activation_cache";
             stored for computing the backward pass efficiently
    """

    if activation == "sigmoid":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        ### START CODE HERE ### (≈ 2 lines of code)
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
        ### END CODE HERE ###

    elif activation == "relu":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        ### START CODE HERE ### (≈ 2 lines of code)
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)
        ### END CODE HERE ###

    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)
   # print(cache)
    return A, cache

3. L-Layer Model

這一步就把多層的神經網路從頭到尾串起來了。前面有L-1層的Relu,第L層是sigmoid。

輸入是X,也就是A[0],和 parameters包含了各個層的W,b

輸出是最後一層的A[L],也就是預測結果Yhat,以及每一層的caches : A[l1],W[l],b[l],Z[l]

def L_model_forward(X, parameters):
    """
    Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation

    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()

    Returns:
    AL -- last post-activation value
    caches -- list of caches containing:
                every cache of linear_activation_forward() (there are L-1 of them, indexed from 0 to L-1)
    """

    caches = []
    A = X
    L = len(parameters) // 2                  # number of layers in the neural network

    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    for l in range(1, L):
        A_prev = A 
        ### START CODE HERE ### (≈ 2 lines of code)
        A, cache = linear_activation_forward(A_prev, parameters['W'+str(l)], parameters['b'+str(l)], 'relu')
        caches.append(cache)
        ### END CODE HERE ###

    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    ### START CODE HERE ### (≈ 2 lines of code)
    AL, cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)],'sigmoid')
    caches.append(cache)
    ### END CODE HERE ###
   # print(AL.shape)
    assert(AL.shape == (1,X.shape[1]))

    return AL, caches

Cost function

1mi=1m(y(i)log(a[L](i))+(1y(i))log(1a[L](i)))

利用np.multiply and np.sum求得交叉熵


def compute_cost(AL, Y):
    """
    Implement the cost function defined by equation (7).

    Arguments:
    AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
    Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """

    m = Y.shape[1]

    # Compute loss from aL and y.
    ### START CODE HERE ### (≈ 1 lines of code)
    cost = - np.sum(np.multiply(Y,np.log(AL)) + np.multiply(1-Y,np.log(1-AL))) / m
    print(cost)
    ### END CODE HERE ###
    cost = np.squeeze(cost)      # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
    assert(cost.shape == ())

    return cost

Backward propagation module

1. Linear backward

首先假設知道 dZ[l]=LZ[l],然後想要求得的是