1. 程式人生 > >吳恩達Coursera深度學習課程 course4-week1 Convolutional Neural Networks & CNN Application 作業

吳恩達Coursera深度學習課程 course4-week1 Convolutional Neural Networks & CNN Application 作業

                                                                                                           P0 前言

第四門課 : Convolutional Neural Networks (卷積神經網路)
第一週 : Foundations of Convolutional Neural Networks (卷積神經網路基礎)
主要知識點 : 計算機視覺、邊緣檢測、卷積神經網路、Padding、卷積、池化等;

視訊地址:https://mooc.study.163.com/learn/2001281004?tid=2001392030#/learn/content

筆記地址:以後補

資料集,原始碼,作業的本地版網頁快取下載:

                                                                                                           P1 作業

Part 1: 神經網路的底層搭建(Convolutional Neural Networks: Step by Step)

這裡,我們要實現一個擁有卷積層(CONV)和池化層(POOL)的網路,它包含了前向和反向傳播。我們來定義一下符號:

 1 匯入庫

import numpy as np
import h5py
import matplotlib.pyplot as plt
​
%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
​
%load_ext autoreload
%autoreload 2
​
np.random.seed(1)

2 作業大綱

我們將實現一個卷積神經網路的一些模組,下面我們將列舉我們要實現的模組的函式功能:

卷積模組,包含了以下函式:

  • 使用0擴充邊界
  • 卷積視窗
  • 前向卷積
  • 反向卷積(可選)

池化模組,包含了以下函式:

  • 前向池化
  • 建立掩碼
  • 值分配
  • 反向池化(可選)

我們將在這裡從底層搭建一個完整的模組,之後我們會用TensorFlow實現。模型結構如下:

model.png

需要注意的是我們在前向傳播的過程中,我們會儲存一些值,以便在反向傳播的過程中計算梯度值。

3- 卷積神經網路

 儘管程式設計框架使卷積容易使用,但它們仍然是深度學習中最難理解的概念之一。卷積層將輸入轉換成不同維度的輸出,如下所示。 

conv_nn

我們將一步步構建卷積層,我們將首先實現兩個輔助函式:一個用於零填充,另一個用於計算卷積。

3.1 - 邊界填充

邊界填充將會在影象邊界周圍新增值為0的畫素點,如下圖所示:

PAD

上圖是使用了使用pading為2的操作對影象(3通道,RGB)進行填充。

使用0填充邊界有以下好處:

  • 幫助我們保留更多的影象邊界資訊。在沒有填充的情況下,卷積過程中影象邊緣的極少數值會受到過濾器的影響從而導致資訊丟失。
  • 使我們避免在接下來的CONV層中縮小卷積核的高度和寬度,(後面的沒看懂什麼意思。)(原文:It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes. This is important for building deeper networks, since otherwise the height/width would shrink as you go to deeper layers. An important special case is the "same" convolution, in which the height/width is exactly preserved after one layer.)

我們將實現一個邊界填充函式,它會把所有的樣本影象XX都使用0進行填充。我們可以使用np.pad來快速填充。需要注意的是如果你想使用pad = 1填充陣列a.shape = (5,5,5,5,5)的第二維,使用pad = 3填充第4維,使用pad = 0來填充剩下的部分,我們可以這麼做:

#constant連續一樣的值填充,有constant_values=(x, y)時前面用x填充,後面用y填充。預設引數是為constant_values=(0,0)

a = np.pad(a,( (0,0),(1,1),(0,0),(3,3),(0,0)),'constant',constant_values = (..,..))

#比如:
import numpy as np
arr3D = np.array([[[1, 1, 2, 2, 3, 4],
             [1, 1, 2, 2, 3, 4], 
             [1, 1, 2, 2, 3, 4]], 

            [[0, 1, 2, 3, 4, 5], 
             [0, 1, 2, 3, 4, 5], 
             [0, 1, 2, 3, 4, 5]], 

            [[1, 1, 2, 2, 3, 4], 
             [1, 1, 2, 2, 3, 4], 
             [1, 1, 2, 2, 3, 4]]])

print 'constant:  \n' + str(np.pad(arr3D, ((0, 0), (1, 1), (2, 2)), 'constant'))
#(1,1)代表第二維填充上下各一行(2,2)代表第三維填充左右各兩列

"""
constant:  
[[[0 0 0 0 0 0 0 0 0 0]
  [0 0 1 1 2 2 3 4 0 0]
  [0 0 1 1 2 2 3 4 0 0]
  [0 0 1 1 2 2 3 4 0 0]
  [0 0 0 0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0]
  [0 0 0 1 2 3 4 5 0 0]
  [0 0 0 1 2 3 4 5 0 0]
  [0 0 0 1 2 3 4 5 0 0]
  [0 0 0 0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0]
  [0 0 1 1 2 2 3 4 0 0]
  [0 0 1 1 2 2 3 4 0 0]
  [0 0 1 1 2 2 3 4 0 0]
  [0 0 0 0 0 0 0 0 0 0]]]
"""

0填充函式:

# GRADED FUNCTION: zero_pad

def zero_pad(X, pad):
    """
    Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image,
    as illustrated in Figure 1.

    Argument:
    X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
    pad -- integer, amount of padding around each image on vertical and horizontal dimensions

    Returns:
    X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
    """

    ### START CODE HERE ### (≈ 1 line)
    X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant')
    #剛開始用的(0,0),(0,2*pad),(0,2*pad),(0,0),實際上(x,y)是指左邊填充x行右邊y行(對應第三維)或者上邊填充x行下邊y行(對應第二維)
    ### END CODE HERE ###

    return X_pad

#檢視結果:
    np.random.seed(1)
    x = np.random.randn(4, 3, 3, 2)
    x_pad = zero_pad(x, 2)
    print("x.shape =", x.shape)
    print("x_pad.shape =", x_pad.shape)
    print("x[1,1] =", x[1, 1])
    print("x_pad[1,1] =", x_pad[1, 1])

    fig, axarr = plt.subplots(1, 2)
    axarr[0].set_title('x')
    axarr[0].imshow(x[0, :, :, 0])
    axarr[1].set_title('x_pad')
    axarr[1].imshow(x_pad[0, :, :, 0])
    plt.show()
#結果:
x.shape = (4, 3, 3, 2)
x_pad.shape = (4, 7, 7, 2)
x[1,1] = [[ 0.90085595 -0.68372786]
 [-0.12289023 -0.93576943]
 [-0.26788808  0.53035547]]
x_pad[1,1] = [[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]

還在路上,稍等...

 

3.2 - Single step of convolution(單步卷積)

在這裡,我們要實現第一步卷積,我們要使用一個過濾器來卷積輸入的資料。先來看看下面的這個gif: 

還在路上,稍等...

上圖中過濾器(filter)大小:f = 2 , 步伐stride:s = 1(stride=你每次滑動時移動視窗的幅度)。

在計算機視覺應用中,左側矩陣中的每個值都對應一個畫素值,我們通過將其值與原始矩陣元素相乘,然後對它們進行求和來將3x3濾波器與影象進行卷積。我們需要實現一個函式,可以將一個3x3濾波器與單獨的切片塊進行卷積並輸出一個實數。現在我們開始實現conv_single_step()

def conv_single_step(a_slice_prev, W, b):
    """
    Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation
    of the previous layer.

    Arguments:
    a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
    W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
    b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)

    Returns:
    Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
    """


    ### START CODE HERE ### (≈ 2 lines of code)
    # Element-wise product between a_slice and W. Do not add the bias yet.
    s = np.multiply(a_slice_prev,W)
    # Sum over all entries of the volume s.
    Z = np.sum(s)
    # Add bias b to Z. Cast b to a float() so that Z results in a scalar value.
    Z = Z+float(b)
    ### END CODE HERE ###

    return Z

#檢視結果:
    np.random.seed(1)
    a_slice_prev = np.random.randn(4, 4, 3)
    W = np.random.randn(4, 4, 3)
    b = np.random.randn(1, 1, 1)

    Z = conv_single_step(a_slice_prev, W, b)
    print("Z =", Z)
#結果:
Z = -6.999089450680221

3.3 - Convolutional Neural Networks - Forward pass(前向傳播)

 在前向傳播的過程中,我們將使用多種過濾器對輸入的資料進行卷積操作,每個過濾器會產生一個2D的矩陣,我們可以把它們堆疊起來,於是這些2D的卷積矩陣就變成了高維的矩陣。

我們需要實現一個函式以實現對啟用值進行卷積。我們需要在啟用值矩陣Aprev上使用過濾器W進行卷積。該函式的輸入是前一層的啟用輸出Aprev以及F個過濾器(其權重矩陣為W、偏置矩陣為b,每個過濾器只有一個偏置)產生。最後,我們需要一個包含了步長(stride)s和填充(padding)p的字典型別的超引數。

小提示:

  • 如果我要在矩陣A_prev(shape = (5,5,3))的左上角選擇一個2x2的矩陣進行切片操作,那麼可以這樣做:

a_slice_prev = a_prev[0:2,0:2,:] 

  • 如果我想要自定義切片,我們可以這麼做:先定義要切片的位置,vert_startvert_end、 horiz_start、 horiz_end,它們的位置我們看一下下面的圖(只適用於單通道)就明白了。  

vert_horiz_kiank.png

關於卷積的輸出形狀與輸入形狀的公式為(由輸入維度計算輸出維度):

n_H = \lfloor \frac{n_{H_{prev}} - f + 2 \times pad}{stride} \rfloor +1

n_W = \lfloor \frac{n_{W_{prev}} - f + 2 \times pad}{stride} \rfloor +1

n_C = {過濾器數量}過濾器數量

這裡我們不會使用向量化,只是用for迴圈來實現所有的東西。

def conv_forward(A_prev, W, b, hparameters):
    """
    Implements the forward propagation for a convolution function

    Arguments:
    A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
    b -- Biases, numpy array of shape (1, 1, 1, n_C)
    hparameters -- python dictionary containing "stride" and "pad"

    Returns:
    Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward() function
    """

    ### START CODE HERE ###
    # Retrieve dimensions from A_prev's shape (≈1 line)
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape#n_C_prev是上一層的過濾器數量

    # Retrieve dimensions from W's shape (≈1 line)
    (f, f, n_C_prev, n_C) = ( W.shape)

    # Retrieve information from "hparameters" (≈2 lines)
    stride = hparameters["stride"]
    pad = hparameters["pad"]

    # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
    n_H = int((n_H_prev-f+2.*pad)*1./stride)+1
    n_W = int((n_W_prev-f+2.*pad)*1./stride)+1

    # Initialize the output volume Z with zeros. (≈1 line)
    Z = np.zeros((m,n_H,n_W,n_C)) #剛開始用的Z = np.zeros((n_H,n_W))

    # Create A_prev_pad by padding A_prev
    A_prev_pad = zero_pad(A_prev,pad=pad)

    for i in range(m):  # loop over the batch of training examples
        a_prev_pad = A_prev_pad[i]  # Select ith training example's padded activation
        for h in range(n_H):  # loop over vertical axis of the output volume #剛開始用的range(n_H_prev)
            for w in range(n_W):  # loop over horizontal axis of the output volume
                for c in range(n_C):  # loop over channels (= #filters) of the output volume

                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = stride*h #剛開始用的vert_start = h
                    vert_end = vert_start+f
                    horiz_start = stride*w #剛開始使用的horiz_start = w
                    horiz_end = horiz_start+f

                    # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
                    a_slice_prev = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]#剛開似乎用的a_slice_prev = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,c]

                    # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
                    Z[i, h, w, c] = conv_single_step(a_slice_prev=a_slice_prev,W=W[:,:,:,c],b=b[:,:,:,c])#剛開始用的W[:,:,c,c],這裡意思是說上一層的三個顏色通道(三個濾波器)全用來產生本層的一個顏色通道(一個濾波器的輸出值)

    ### END CODE HERE ###

    # Making sure your output shape is correct
    assert (Z.shape == (m, n_H, n_W, n_C))

    # Save information in "cache" for the backprop
    cache = (A_prev, W, b, hparameters)

    return Z, cache

#檢視結果:
    np.random.seed(1)
    A_prev = np.random.randn(10, 4, 4, 3)
    W = np.random.randn(2, 2, 3, 8)
    b = np.random.randn(1, 1, 1, 8)
    hparameters = {"pad": 2,
                   "stride": 2}

    Z, cache_conv = conv_forward(A_prev, W, b, hparameters)
    print("Z's mean =", np.mean(Z))
    print("Z[3,2,1] =", Z[3, 2, 1])
    print("b=",b[0,0,0,:])
    print("W =", W[0, 0, 0, :])
    print("cache_conv[0][1][2][3] =", cache_conv[0][1][2][3])

結果:

#結果:
Z's mean = 0.048995203528855794
Z[3,2,1] = [-0.61490741 -6.7439236  -2.55153897  1.75698377  3.56208902  0.53036437
  5.18531798  8.75898442]
b= [ 0.37245685 -0.1484898  -0.1834002   1.1010002   0.78002714 -0.6294416
 -1.1134361  -0.06741002]
W= [ 0.5154138  -1.11487105 -0.76730983  0.67457071  1.46089238  0.5924728
  1.19783084  1.70459417]
cache_conv[0][1][2][3] = [-0.20075807  0.18656139  0.41005165]

最後,卷積層應該包含一個啟用函式,我們可以加一行程式碼來計算:

#獲取輸出
Z[i, h, w, c] = ...
#計算啟用
A[i, h, w, c] = activation(Z[i, h, w, c])

不過,在這裡我們不需要這麼做。

4 - 池化層

 池化層會減少輸入的寬度和高度,這樣它會減少計算量的同時也使特徵檢測器對其在輸入中的位置更加穩定。下面介紹兩種型別的池化層:

  • Max-pooling layer:在輸入矩陣中滑動一個大小為fxf的視窗,選取窗口裡的值中的最大值,然後作為輸出的一部分。
  • Average-pooling layer:在輸入矩陣中滑動一個大小為fxf的視窗,計算窗口裡的值中的平均值,然後這個均值作為輸出的一部分。

 

 池化層沒有用於進行反向傳播的引數,但是它們有像視窗的大小為f的超引數,它指定fxf視窗的高度和寬度,我們可以計算出最大值或平均值。

4.1 池化層的前向傳播

現在我們要在同一個函式中實現最大值池化層均值池化層。由於沒有padding,池化層輸出的維度和原來的維度計算公式相對簡單:

 n_H = \lfloor \frac{n_{H_{prev}} - f}{stride} \rfloor +1

n_W = \lfloor \frac{n_{W_{prev}} - f}{stride} \rfloor +1

n_C = n_{C_{prev}}

def pool_forward(A_prev, hparameters, mode="max"):
    """
    Implements the forward pass of the pooling layer

    Arguments:
    A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    hparameters -- python dictionary containing "f" and "stride"
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

    Returns:
    A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters
    """

    # Retrieve dimensions from the input shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

    # Retrieve hyperparameters from "hparameters"
    f = hparameters["f"]
    stride = hparameters["stride"]

    # Define the dimensions of the output
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev

    # Initialize output matrix A
    A = np.zeros((m, n_H, n_W, n_C))

    ### START CODE HERE ###
    for i in range(m):  # loop over the training examples
        for h in range(n_H):  # loop on the vertical axis of the output volume
            for w in range(n_W):  # loop on the horizontal axis of the output volume
                for c in range(n_C):  # loop over the channels of the output volume

                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = stride*h
                    vert_end = vert_start+f
                    horiz_start = stride*w
                    horiz_end = horiz_start+f

                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    a_prev_slice = A_prev[i,vert_start:vert_end,horiz_start:horiz_end,c]#這裡因為沒有卷積(濾波)操作所以最後一維不用全取,操作的單位是每個通道進行一次所以直接選第c個通道進行池化
                                                                                        #而卷積則是以本層層的濾波器為單位,每個濾波器進行一次操作,所以上一層的所有通道一起進行卷積

                    # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.mean(a_prev_slice)

    ### END CODE HERE ###

    # Store the input and hparameters in "cache" for pool_backward()
    cache = (A_prev, hparameters)

    # Making sure your output shape is correct
    assert (A.shape == (m, n_H, n_W, n_C))

    return A, cache
#檢視結果:    
    np.random.seed(1)
    A_prev = np.random.randn(2, 4, 4, 3)
    hparameters = {"stride": 2, "f": 3}

    A, cache = pool_forward(A_prev, hparameters)
    print("mode = max")
    print("A =", A)
    print()
    A, cache = pool_forward(A_prev, hparameters, mode="average")
    print("mode = average")
    print("A =", A)

#結果:
mode = max
A = [[[[1.74481176 0.86540763 1.13376944]]]
 [[[1.13162939 1.51981682 2.18557541]]]]

mode = average
A = [[[[ 0.02105773 -0.20328806 -0.40389855]]]
 [[[-0.22154621  0.51716526  0.48155844]]]]

5 - 卷積神經網路中的反向傳播(選學)

在現在的深度學習框架中,你只需要實現前向傳播,框架負責向後傳播,所以大多數深度學習工程師不需要費心處理後向傳播的細節,卷積網路的後向傳遞是有點複雜的。但是如果你願意,你可以選擇性來學習本節。

在前面的課程中,我們已經實現了一個簡單的(全連線)神經網路,我們使用反向傳播來計算關於更新引數的成本的梯度。類似地,在卷積神經網路中我們可以計算出關於成本的導數來更新引數。反向傳播的方程並不簡單,吳恩達老師並沒有在課堂上推導它們,但我們可以在下面簡要介紹。

5.1 - 卷積層的反向傳播

我們來看一下如何實現卷積層的反向傳播

5.1.1 - 計算dA

這裡A_prev和W,b,Z之間的關係可以這樣描述:

Z=W*A_prev+b

因此計算dA_prev時的思路是W*dZ,具體計算公式如下:

dA += \sum ^{n_H} _{h=0} \sum ^{n_W} _{w=0} W_{c} \times dZ_{hw} \tag{1}

其中:  Z_hw = W_c * A_prev_slice + b

其中W_c是過濾器,Z_hw是當前卷積層的第h行w列的值的導數(是一個標量)。

當前卷積輸出Z的每個值都相當於一個因變數,他對應的自變數是切片後的a_slice,W則相當於一個不變的係數矩陣,由於W矩陣中每個值都對計算Z有貢獻(參考上面式子),所以求dA_prev_slice時,a_slice(就是A_prev_slice,這裡簡寫一下因為中間有一些變換,具體見程式碼)對應的那個W_c(不同的a_slice對應不同的W)的每個權重係數都要乘他們對應的計算結果(Z_hw)的導數。

之後把不同切片的dA_prev_slice整合起來就能得到dA_prev。

上面的解釋反映到公式中就是:

da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c]*dZ[i,h,w,c]
#等號左邊相當於da_prev_slice
#dZ[i , h , w , c]就是由W_c和a_slice共同計算得到的一個標量
#對應當前卷積層第c個濾波器在(h,w)位置得到的濾波值

 5.1.2 - 計算dW

由上面的分析可知,在求對過濾器W的導師時:可將W_c視為自變數而a_slice則作為常量。由此得出計算公式:

dW_c += \sum^{n_H}_{h=0} \sum^{n_W}_{w=0}a_{slice} \times dZ_{hw} \tag{2}

其中a_slice就是對應著產生了Z_hw(矩陣Z中的一個值)的那塊切片,這個切片的大小是和它對應的過濾器W_c一樣大的。該切片(a_slice)中每個值同樣對生成Z_hw有貢獻,所以求dW_c時a_slice的每個值都要乘dZ_hw。

之後把不同濾波器的dW_c整合起來就能得到某一層中所有濾波器的dW

上面解釋反映到公式就是:

dW[:,:,:, c] += a_slice * dZ[i , h , w , c]
#其中等號左邊就是當前卷積層第c個濾波器的引數W_c對應的導數
#dZ[i , h , w , c]就是由W_c和a_slice共同計算得到的一個標量
#對應當前卷積層第c個濾波器在(h,w)位置得到的濾波值

5.1.3 - 計算db

db計算很簡單,直接上公式:

db = \sum_{h} \sum_{w}dZ_{hw}

和以前的神經網路一樣,dbdb是由dZdZ的累加計算的,在這裡,我們只需要將conv的輸出ZZ的所有梯度累加就好了。在程式碼上我們只需要使用一行程式碼實現:

db[:,:,:,c] += dZ[ i, h, w, c]

5.1.4 - 函式實現

現在我們將實現反向傳播函式conv_backward(),我們需要把所有的訓練樣本的過濾器、權值、高度、寬度都要加進來,然後使用公式1、2、3計算對應的梯度。

def conv_backward(dZ, cache):
    """
    Implement the backward propagation for a convolution function

    Arguments:
    dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward(), output of conv_forward()

    Returns:
    dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
               numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    dW -- gradient of the cost with respect to the weights of the conv layer (W)
          numpy array of shape (f, f, n_C_prev, n_C)
    db -- gradient of the cost with respect to the biases of the conv layer (b)
          numpy array of shape (1, 1, 1, n_C)
    """

    ### START CODE HERE ###
    # Retrieve information from "cache"
    (A_prev, W, b, hparameters) = cache

    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape

    # Retrieve information from "hparameters"
    stride = hparameters["stride"]
    pad = hparameters["pad"]

    # Retrieve dimensions from dZ's shape
    (m, n_H, n_W, n_C) = dZ.shape

    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros((A_prev.shape))
    dW = np.zeros((W.shape))
    db = np.zeros((b.shape))


    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev,pad)
    dA_prev_pad = zero_pad(dA_prev,pad)

    for i in range(m):  # loop over the training examples

        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i]
        da_prev_pad = dA_prev_pad[i]

        for h in range(n_H):  # loop over vertical axis of the output volume
            for w in range(n_W):  # loop over horizontal axis of the output volume
                for c in range(n_C):  # loop over the channels of the output vo lume
                    # 剛開始用的n_C_prev,這裡應該以dZ的維度為單位進行迴圈 (和Z的維度都一樣,所以迴圈過程和前向傳播也一樣)                   

                    # Find the corners of the current "slice"
                    vert_start = stride*h
                    vert_end = vert_start+f
                    horiz_start = stride*w
                    horiz_end = horiz_start+f

                    # Use the corners to define the slice from a_prev_pad
                    a_slice = a_prev_pad[vert_start:vert_end,horiz_start:horiz_end,:]

                    # Update gradients for the window and the filter's parameters using the code formulas given above
                    da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c]*dZ[i,h,w,c]#這個相當於da_prev_slice
                    dW[:, :, :, c] += a_slice*dZ[i,h,w,c]
                    db[:, :, :, c] += dZ[i,h,w,c]
                    # 其中等號左邊就是當前卷積層第c個濾波器的引數W_c對應的導數
                    # dZ[i , h , w , c]就是由W_c和a_slice共同計算得到的一個標量
                    # 對應當前卷積層第c個濾波器在(h,w)位置得到的濾波值

        # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
        dA_prev[i, :, :, :] = da_prev_pad[pad:-pad,pad:-pad,:]#這一步就是把非填充的資料取出來[pad:-pad]意思是從正數第pad行到倒數第pad行之間的資料(不包括倒數第pad行)
    ### END CODE HERE ###

    # Making sure your output shape is correct
    assert (dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))

    return dA_prev, dW, db

檢視結果:

    np.random.seed(1)
    # 初始化引數
    A_prev = np.random.randn(10, 4, 4, 3)
    W = np.random.randn(2, 2, 3, 8)
    b = np.random.randn(1, 1, 1, 8)
    hparameters = {"pad": 2, "stride": 2}
    # 前向傳播
    Z, cache_conv = conv_forward(A_prev, W, b, hparameters)
    # 後向傳播
    print(Z.shape)
    #dZ = np.random.randn(10,4,4,8)
    dA, dW, db = conv_backward(Z, cache_conv)
    print("dA_mean =", np.mean(dA))
    print("dW_mean =", np.mean(dW))
    print("db_mean =", np.mean(db))

ng老師這裡沒給出計算dZ的方法,直接把Z當做dZ,得到的結果是:

(10, 4, 4, 8)
dA_mean = 1.45243777754
dW_mean = 1.72699145831
db_mean = 7.83923256462

5.2 - 池化層的反向傳播

 接下來,我們從最大值池化層開始實現池化層的反向傳播。 即使池化層沒有反向傳播過程中要更新的引數,我們仍然需要通過池化層反向傳播梯度,以便為在池化層之前的層(比如卷積層)計算梯度。

5.2.1 最大值池化層的反向傳播

在開始池化層的反向傳播之前,我們需要建立一個create_mask_from_window()的函式,我們來看一下它是幹什麼的: 

X = \begin{bmatrix} 1 && 3 \\ 4 && 2 \end {bmatrix} \quad \rightarrow \quad M = \begin{bmatrix} 0 && 0 \\ 1 && 0 \end {bmatrix} \tag{4}

正如你所看到的,這個函式建立了一個掩碼矩陣,以儲存最大值的位置,當為1的時候表示最大值的位置,其他的為0,這個是最大值池化層,均值池化層的向後傳播也和這個差不多,但是使用的是不同的掩碼。

這裡先不考慮有多個最值的情況:

def create_mask_from_window(x):
    """
    Creates a mask from an input matrix x, to identify the max entry of x.

    Arguments:
    x -- Array of shape (f, f)

    Returns:
    mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
    """

    ### START CODE HERE ### (≈1 line)
    mask = (x == np.max(x))
    ### END CODE HERE ###

    return mask

#檢視結果:
np.random.seed(1)
x = np.random.randn(2,3)
mask = create_mask_from_window(x)
print('x = ', x)
print("mask = ", mask)
#結果
x =  [[ 1.62434536 -0.61175641 -0.52817175]
 [-1.07296862  0.86540763 -2.3015387 ]]
mask =  [[ True False False]
 [False False False]]

為什麼我們要建立這一個掩碼矩陣呢?想一下我們的正向傳播首先是經過卷積層,然後滑動地取卷積層最大值構成了池化層,由這個最大值貢獻對cost的影響,而每個影響cost的值都應該有一個非零的導數。另一個原因是:如果我們不記錄最大值的位置,那麼我們怎樣才能反向傳播到卷積層呢?

5.2.2 均值池化層的反向傳播

在最大值池化層中,對於每個輸入視窗,輸出的所有值都來自輸入中的最大值,但是在均值池化層中,因為是計算均值,所以輸入視窗的每個元素對輸出有一樣的影響,我們來看看如何反向傳播:

例如,如果我們使用2x2過濾器對前向通道進行平均池化,那麼用於後向通道的掩碼將如下所示:

dZ=1 \quad \rightarrow \quad dZ = \begin{bmatrix} \frac{1}{4} && \frac{1}{4} \\ \frac{1}{4} && \frac{1}{4} \end{bmatrix}

這意味著dZ矩陣中的每個位置對輸出的貢獻相等,因為在正向傳遞中,我們取的是平均值。

def distribute_value(dz, shape):
    """
    Distributes the input value in the matrix of dimension shape

    Arguments:
    dz -- input scalar
    shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz

    Returns:
    a -- Array of size (n_H, n_W) for which we distributed the value of dz
    """

    ### START CODE HERE ###
    # Retrieve dimensions from shape (≈1 line)
    (n_H, n_W) = shape

    # Compute the value to distribute on the matrix (≈1 line)
    average = 1.*dz/(n_H*n_W)

    # Create a matrix where every entry is the "average" value (≈1 line)
    a = np.ones(shape=shape)*average
    ### END CODE HERE ###

    return a

#檢視結果:
dz = 2
shape = (2,2)
a = distribute_value(dz,shape)
print("a = " + str(a))

#結果:
a = [[ 0.5  0.5]
 [ 0.5  0.5]]

5.2.3 池化層整體的反向傳播

現在您已經擁有了在池化層上計算向後傳播所需的所有東西,可以著手實現整個池化層的反向傳播:

def pool_backward(dA, cache, mode="max"):
    """
    Implements the backward pass of the pooling layer

    Arguments:
    dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
    cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

    Returns:
    dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
    """

    ### START CODE HERE ###

    # Retrieve information from cache (≈1 line)
    (A_prev, hparameters) = cache

    # Retrieve hyperparameters from "hparameters" (≈2 lines)
    stride = hparameters["stride"]
    f = hparameters["f"]

    # Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
    m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape
    m, n_H, n_W, n_C = dA.shape

    # Initialize dA_prev with zeros (≈1 line)
    dA_prev = np.zeros(A_prev.shape)

    for i in range(m):  # loop over the training examples

        # select training example from A_prev (≈1 line)
        a_prev = A_prev[i]

        for h in range(n_H):  # loop on the vertical axis
            for w in range(n_W):  # loop on the horizontal axis
                for c in range(n_C):  # loop over the channels (depth)

                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = stride*h
                    vert_end = vert_start+f
                    horiz_start = stride*w
                    horiz_end = horiz_start+f

                    # Compute the backward propagation in both modes.
                    if mode == "max":

                        # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                        a_prev_slice = a_prev[vert_start:vert_end,horiz_start:horiz_end,c]
                        # Create the mask from a_prev_slice (≈1 line)
                        mask = create_mask_from_window(a_prev_slice)
                        # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += dA[i,h,w,c]*mask#剛開始乘的是a_prev_slice還是因為不理解原理

                    elif mode == "average":

                        # Get the value a from dA (≈1 line)
                        da = dA[i,h,w,c]
                        # Define the shape of the filter as fxf (≈1 line)
                        shape = (f,f)
                        # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da,shape)

    ### END CODE ###

    # Making sure your output shape is correct
    assert (dA_prev.shape == A_prev.shape)

    return dA_prev
#檢視結果:
    np.random.seed(1)
    A_prev = np.random.randn(5, 5, 3, 2)
    hparameters = {"stride": 1, "f": 2}
    A, cache = pool_forward(A_prev, hparameters)
    dA = np.random.randn(5, 4, 2, 2)
    dA_prev = pool_backward(dA, cache, mode="max")
    print("mode = max")
    print('mean of dA = ', np.mean(dA))
    print('dA_prev[1,1] = ', dA_prev[1, 1])
    print()
    dA_prev = pool_backward(dA, cache, mode="average")
    print("mode = average")
    print('mean of dA = ', np.mean(dA))
    print('dA_prev[1,1] = ', dA_prev[1, 1])

#結果:
mode = max
mean of dA =  0.14571390272918056
dA_prev[1,1] =  [[ 0.          0.        ]
 [ 5.05844394 -1.68282702]
 [ 0.          0.        ]]

mode = average
mean of dA =  0.14571390272918056
dA_prev[1,1] =  [[ 0.08485462  0.2787552 ]
 [ 1.26461098 -0.25749373]
 [ 1.17975636 -0.53624893]]

Part2 神經網路的應用

我們已經使用了原生程式碼實現了卷積神經網路,現在我們要使用TensorFlow來實現,然後應用到手勢識別中,在這裡我們要實現4個函式,一起來看看吧~

1 Tensorflow模型

首先匯入庫:

# -*- encoding:utf-8 -*-
import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
import tensorflow as tf
from tensorflow.python.framework import ops
from cnn_utils import *

#%matplotlib inline
np.random.seed(1)

我們使用course2 week3中的相同的手勢資料集:https://blog.csdn.net/zongza/article/details/83344053

SIGNS.png

我們再來看一下里面有什麼:

index = 6
plt.imshow(X_train_orig[index])
plt.show()
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

#結果:
y = 2

y=2

在course2 week3中,我們已經建立過一個神經網路,我想對這個資料集應該不陌生,我們再來看一下資料的維度,如果你忘記了獨熱編碼的實現,可回頭看這裡:https://blog.csdn.net/zongza/article/details/83344053#t6

X_train = X_train_orig/255.
X_test = X_test_orig/255.
Y_train = cnn_utils.convert_to_one_hot(Y_train_orig, 6).T
Y_test = cnn_utils.convert_to_one_hot(Y_test_orig, 6).T
print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
conv_layers = {}

#結果:
number of training examples = 1080
number of test examples = 120
X_train shape: (1080, 64, 64, 3)
Y_train shape: (1080, 6)
X_test shape: (120, 64, 64, 3)
Y_test shape: (120, 6)

1.1 建立placeholders

TensorFlow要求您為輸入資料建立佔位符(placeholder)真正的資料將在執行session時輸入到模型中。現在我們要實現為輸入X和輸出Y建立佔位符的函式,因為我們使用的batch_size可能不確定,所以我們在數量那裡我們要使用None作為可變數量。輸入X的維度為[None, n_H0, n_W0, n_C0],對應的Y是[None, n_y]。 佔位符使用參考:https://www.w3cschool.cn/tensorflow_python/tensorflow_python-w7yt2fwc.html

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_H0, n_W0, n_C0, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_H0 -- scalar, height of an input image
    n_W0 -- scalar, width of an input image
    n_C0 -- scalar, number of channels of the input
    n_y -- scalar, number of classes

    Returns:
    X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
    Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"
    """


    ### START CODE HERE ### (≈2 lines)
    X = tf.placeholder(tf.float32,shape=[None,n_H0,n_W0,n_C0],name="X")
    Y = tf.placeholder(tf.float32,shape=[None,n_y],name="Y")
    ### END CODE HERE ###

    return X, Y

檢視結果:

#檢視結果:
X, Y = create_placeholders(64, 64, 3, 6)
print ("X = " + str(X))
print ("Y = " + str(Y))

#結果:
X = Tensor("X:0", shape=(?, 64, 64, 3), dtype=float32)
Y = Tensor("Y:0", shape=(?, 6), dtype=float32)

1.2 初始化引數

 現在我們將使用tf.contrib.layers.xavier_initializer(seed = 0) 來初始化權值/過濾器W1、W2。在這裡,我們不需要考慮偏置b,因為TensorFlow會考慮到的。需要注意的是我們只需要初始化2D卷積函式,全連線層TensorFlow會自動初始化的。

# GRADED FUNCTION: initialize_parameters

def initialize_parameters():
    """
    Initializes weight parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [4, 4, 3, 8]
                        W2 : [2, 2, 8, 16]
    Returns:
    parameters -- a dictionary of tensors containing W1, W2
    """

    tf.set_random_seed(1)  # so that your "random" numbers match ours

    ### START CODE HERE ### (approx. 2 lines of code)
    W1 = tf.get_variable("W1", [4, 4, 3, 8], initializer=tf.contrib.layers.xavier_initializer(seed=0))
    W2 = tf.get_variable("W2", [2, 2, 8, 16], initializer=tf.contrib.layers.xavier_initializer(seed=0))
    '''
    剛開始寫成:
    W1 = tf.constant(shape=[4,4,3,8],dtype=tf.float32)
    W2 = tf.constant(shape=[2,2,8,16],dtype=tf.float32)
    關於get_variable()函式:https://blog.csdn.net/u012436149/article/details/53696970/
    '''
    ### END CODE HERE ###


    parameters = {"W1": W1,
                  "W2": W2}

    return parameters

關於tf.get_variable()和tf.variable() : https://blog.csdn.net/u012436149/article/details/53696970/

檢視結果:

    tf.reset_default_graph()
    with tf.Session() as sess_test:
        parameters = initialize_parameters()
        init = tf.global_variables_initializer()
        sess_test.run(init)
        print("W1 = " + str(parameters["W1"].eval()[1, 1, 1]))
        print("W2 = " + str(parameters["W2"].eval()[1, 1, 1]))
        sess_test.close()
W1 = [ 0.00131723  0.1417614  -0.04434952  0.09197326  0.14984085 -0.03514394
 -0.06847463  0.05245192]
W2 = [-0.08566415  0.17750949  0.11974221  0.16773748 -0.0830943  -0.08058
 -0.00577033 -0.14643836  0.24162132 -0.05857408 -0.19055021  0.1345228
 -0.22779644 -0.1601823  -0.16117483 -0.10286498]

1.2 - 前向傳播

在TensorFlow裡面有一些可以直接拿來用的函式:

  • tf.nn.conv2d(X,W1,strides=[1,s,s,1],padding='SAME'):給定輸入X和一組過濾器W1,這個函式將會自動使用W1來對X進行卷積,第三個輸入引數是[1,s,s,1]是指對於輸入 (m, n_H_prev, n_W_prev, n_C_prev)而言,每次滑動的步伐。文件參考  文件參考2
  • tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = 'SAME'):給定輸入X,該函式將會使用大小為(f,f)以及步伐為(s,s)的視窗對其進行滑動取最大值。參考文件
  • tf.nn.relu(Z1):計算Z1的ReLU啟用 。參考文件
  • tf.contrib.layers.flatten(P):給定一個輸入P,此函式將會把每個樣本轉化成一維的向量,然後返回一個tensor變數,其維度為(batch_size,k)。參考文件
  • tf.contrib.layers.fully_connected(F, num_outputs):給定一個已經一維化了的輸入F,此函式將會返回一個由全連線層計算過後的輸出。參考文件

使用tf.contrib.layers.fully_connected(F, num_outputs)的時候,全連線層會自動初始化權值且在你訓練模型的時候它也會一直參與,所以當我們初始化引數的時候我們不需要專門去初始化它的權值。

我們實現前向傳播的時候,我們需要定義一下我們模型的大概樣子:

CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

我們具體實現的時候,我們需要使用以下的步驟和引數:

  • Conv2d : 步伐:1,填充方式:“SAME”
  • ReLU
  • Max pool : 過濾器大小:8x8,步伐:8x8,填充方式:“SAME”
  • Conv2d : 步伐:1,填充方式:“SAME”
  • ReLU
  • Max pool : 過濾器大小:4x4,步伐:4x4,填充方式:“SAME”
  • 一維化上一層的輸出
  • 全連線層(FC):使用沒有非線性啟用函式的全連線層。這裡不要呼叫SoftMax, 這將導致輸出層中有6個神經元,然後再傳遞到softmax。 在TensorFlow中,softmax和cost函式被集中到一個函式中,在計算成本時您將呼叫不同的函式。
# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model:
    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "W2"
                  the shapes are given in initialize_parameters
​
    Returns:
    Z3 -- the output of the last LINEAR unit
    """

    # Retrieve the parameters from the dictionary "parameters"
    W1 = parameters['W1']
    W2 = parameters['W2']

    ### START CODE HERE ###
    '''
    #剛開始加加了這兩句,但是實際上s和f的值下面註釋中有提示
    s = parameters["stride"]
    f = parameters["f"]
    '''
    # CONV2D: stride of 1, padding 'SAME'
    Z1 = tf.nn.conv2d(X,W1,strides=[1,1,1,1],padding='SAME')
    # RELU
    A1 = tf.nn.relu(Z1)
    # MAXPOOL: window 8x8, sride 8, padding 'SAME'
    P1 = tf.nn.max_pool(A1,[1,8,8,1],[1,8,8,1],padding='SAME')
    # CONV2D: filters W2, stride 1, padding 'SAME'
    Z2 = tf.nn.conv2d(P1,W2,[1,1,1,1],padding='SAME')
    # RELU
    A2 = tf.nn.relu(Z2)
    # MAXPOOL: window 4x4, stride 4, padding 'SAME'
    P2 = tf.nn.max_pool(A2,[1,4,4,1],[1,4,4,1],padding='SAME')
    # FLATTEN
    P2 = tf.contrib.layers.flatten(P2)
    # FULLY-CONNECTED without non-linear activation function (not not call softmax).
    # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"
    Z3 = tf.contrib.layers.fully_connected(P2,num_outputs = 6,activation_fn =None)
    ### END CODE HERE ###


    return Z3

檢視結果:

    tf.reset_default_graph()
    with tf.Session() as sess:
        np.random.seed(1)
        X, Y = create_placeholders(64, 64, 3, 6)
        parameters = initialize_parameters()
        Z3 = forward_propagation(X, parameters)
        init = tf.global_variables_initializer()
        sess.run(init)
        a = sess.run(Z3, {X: np.random.randn(2, 64, 64, 3), Y: np.random.randn(2, 6)})
        print("Z3 = " + str(a))
        sess.close()

結果是:

Z3 = [[ 1.4416982  -0.24909675  5.4504995  -0.26189643 -0.2066989   1.3654672 ]
 [ 1.4070848  -0.02573231  5.0892797  -0.48669893 -0.40940714  1.2624854 ]]

#注意這裡和官網結果不一樣,原因是tensorflow版本不同,官網答案使用的tf版本老
#官網結果:
Z3 = [[-0.44670227 -1.57208765 -1.53049231 -2.31013036 -1.29104376  0.46852064]
 [-0.17601591 -1.57972014 -1.4737016  -2.61672091 -1.00810647  0.5747785 ]]

1.3計算損失

我們要在這裡實現計算成本的函式,下面的兩個函式是我們要用到的:

  • tf.nn.softmax_cross_entropy_with_logits(logits = Z3 , lables = Y):計算softmax的損失函式。這個函式既計算softmax的啟用,也計算其損失。參考文件
  • tf.reduce_mean:計算的是平均值,使用它來計算所有樣本的損失來得到總成本。參考文件

實現成本函式:

# GRADED FUNCTION: compute_cost

def compute_cost(Z3, Y):
    """
    Computes the cost

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """

    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Z3,labels=Y))#剛開始忘了用reduce_mean求均值結果cost是個1*4的向量
    ### END CODE HERE ###

    return cost

檢視結果:

    tf.reset_default_graph()
    with tf.Session() as sess:
        np.random.seed(1)
        X, Y = create_placeholders(64, 64, 3, 6)
        parameters = initialize_parameters()
        Z3 = forward_propagation(X, parameters)
        cost = compute_cost(Z3, Y)
        init = tf.global_variables_initializer()
        sess.run(init)
        a = sess.run(cost, {X: np.random.randn(4, 64, 64, 3), Y: np.random.randn(4, 6)})
        print("cost = " + str(a))

#結果:
    cost = 4.6648703
#官網結果(不同的原因見上面一節):
    cost = 2.91034

1.4 構建模型

最後,我們已經實現了我們所有的函式,我們現在就可以實現我們的模型了。

我們之前在課程2就實現過random_mini_batches()這個函式,它返回的是一個mini-batches的列表。

在實現這個模型的時候我們要經歷以下步驟:

  • 建立佔位符
  • 初始化引數
  • 前向傳播
  • 計算成本
  • 反向傳播
  • 建立優化器

最後,我們將建立一個session來執行模型。

初始化變數的參考文件