吳恩達視訊學習課後作業(01.神經網路和深度學習--week2/assignment2_2)

阿新 • • 發佈：2019-01-23

最近從頭開始學習深度學習，想借此記錄學習過程用於日後複習或者回看，以下是個人見解，如有錯誤請見諒並指出，謝謝。

目的

Assignment2_2作業主要是引導新手從頭開始建一個用邏輯迴歸分類器( classififier, LR)來識別影象中是否有貓。簡單來說，邏輯迴歸是二值分類器，可以看作是僅含有一個神經元的單層神經網路。

主要步驟

讀取資料集並做預處理
初始化引數（權重w、偏置b、迭代次數、學習速率learning_rate等）
計算cost function和其梯度值
優化網路，更新引數，最小化cost function的值
計算預測值與ground truth做比較，得出正確率

在這過程中，你需要定義幾個方法{sigmoid_function()、initiolize_with_zeros()、propagation()、optimize()、predict()}，最後將所有方法整合到model()中。

檔案介紹

資料集：m個樣本的訓練集和n個樣本的測試集，訓練/測試集資料維數為(m,64,64,3)/(n,64,64,3)，每個樣本為3通道(RGB)，解析度64*64的彩色影象，影象被標記為cat(y=1)和non-cat(y=0)兩種型別。每個畫素值實際上是含有3個值的向量，值域為[0,255]

資料讀取檔案：lr_utils.py

呼叫該檔案中的load_dataset()函式可獲得輸入、輸出資料和類別(train_set_x_orig,train_set_y,test_set_x_orig,test_set_y, classes)

train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

具體步驟

載入資料集，對影象做預處理：

將每張影象的維數[64,64,3]改為[64*64*3,1]，那麼輸入陣列的行代表樣本數，列代表樣本資料。一般預處理的方法是將資料進行歸一化（將資料縮放到一個小的區間內，這樣可以將資料轉化為無量綱的純數值，打破資料的單位限制，方便不同單位或量級的資料進行加權比較的操作），最簡單的方法就是將每個畫素值都除以255，使其落到[0,1]區間內

train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
###preprocessing
###reshape images[x,x,3] in a numpy-array[x*x*3,1]
###each column represents a flattened image
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1] 
train_set_x_flatten = train_set_x_orig.reshape(m_train,-1).T ##equals to reshape(m_train, num_px*num_px*3)
test_set_x_flatten = test_set_x_orig.reshape(m_test,-1).T
##center and standardize the data
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.

經預處理後得到的train_set_x,test_set_x便作為網路的輸入資料。

構建邏輯迴歸分類器，定義相關函式：

首先先給出邏輯迴歸的定義式：任意輸入 $x\epsilon R^{nx}$ ，有

$\hat{y}=P(y=1|x), \hat{y}\epsilon [0,1]$

$(1)$

其中nx表示特徵數， $\hat{y}$ 表示當輸入為x時，y =1的概率。上面說到，邏輯迴歸是二分類，所以y的取值為{0,1}。 $\hat{y}$ 的計算公式如下：

$\hat{y} = \sigma (w^{T}x + b)$

$(2)$

即用sigmoid方程計算，也可說是激勵函式(Activation function):

$\sigma(w^{T}x+b)=\sigma(z)=\frac{1 }{ 1+e^{-z}}$

$(3)$

當已知{( $x^{(1)},y^{(1)}$ ),.....,( $x^{m},y^{m}$ )}(m是樣本數目)，我們希望的是 $\hat{y}\rightarrow y$ ，因此我們需要計算損失函式的值，並最小化損失函式。這裡面有兩個概念，Lost(error) function 和Cost function。

Lost function 是定義在單個樣本上的函式：

$L(\hat{y},y)=-[ylog\hat{y}+(1-y)log(1-\hat{y})]$

$(4)$

Cost function 是定義在整個樣本集上的函式，也可以看作是Lost function的均值：

$J(w,b)=\frac{1}{m}\sum_{i}^{m}L(\hat{y}^{i},y^{i})=-\frac{1}{m}\sum_{i}^{m}[y^{i}log\hat{y^{i}}+(1-y^{i})log(1-\hat{y^{i}})]$

$(5)$

歸根結底要做的就是找出最佳引數w、b的值，來使J(或者cost)達到最小。

那麼怎麼找到最佳引數呢？用Back Propagation來計算w和b的值，再用Front Propagation計算cost的值，然後重複此過程直至迭代停止。而計算w，b的值就需要用到梯度下降法了。我們根據式(5)分別求出w和b的偏導數，每次迭代中更新w和b的值：

$w=w-\alpha \frac{\partial J(w,b)}{\partial w}$

$b=b-\alpha \frac{\partial J(w,b)}{\partial b}$

$(6)$

其中 $\alpha$ 是學習速率，用來控制引數下降的快慢或者說收斂速度，在編寫程式的過程中， $\alpha$ 初始化為0.05，設定每一次迭代學習速率都降為其原本的99.9%，使當cost越靠近最小值時，w和b走的越小步(更新變化越小)，這樣可避免cost收斂過快反而跳過最小值。

那麼從以上分析我們就可以來寫程式碼了，這裡提醒幾點，一般輸入特徵都不止一個，所以當更新多個w值時，應該向量化w，一次更新便可用一句程式碼完成，這樣可以加快程式的執行速度。同時，我將每次迭代算出的cost依次比較，並儲存使cost達到最小值的引數w和b：

##definition of sigmoid function
def sigmoid_function(z):
    s = 1 / (1 + np.exp(-z))
    return s


##initializing parameters w&b, create a vector of zeros of shape((dim,1),type = float64)
def initiolize_with_zeros(dim):
    w = np.zeros((dim,1))
    b = 0
    return w, b


##propagation 
def propagation(w, b, x ,y):
    ##forward propagation
    y_hat = sigmoid_function(np.dot(w.T,x) + b)
    y_diff = y_hat - y
    L = -(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))  ##Loss function
    cost =  np.sum(L) / x.shape[1]
    ##backward propagation
    dw = np.dot(x, y_diff.T) / x.shape[1]
    db = np.sum(y_diff) / x.shape[1]

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    ##save as dictionary
    grads = {"dw" : dw, "db" : db}
    return grads, cost


##optimization, learn w&b by minimizing the cost
##update parameters using gradient descent
def optimize(w, b, x, y, num_iterations, learning_rate):
    costs = []
    best_cost = np.array([1.])
    best_params = {}
    decay = 0.999  ##decay of learning_rate
    for i in range(num_iterations):
        grads, cost = propagation(w, b ,x ,y)
        dw = grads["dw"]
        db = grads["db"]
        ##update params
        w = w - learning_rate * dw
        b = b - learning_rate * db
        learning_rate *= decay 
        ##record cost every 100 iteration
        if i % 100 == 0:
            costs.append(cost)
#            print "cost after iteration %d: %+f" %(i, cost)
#            print "learning_rate:%f"%learning_rate
        ##when the data_set is big enough
        ##save the params at the smallest cost
        if cost < best_cost:
                best_cost = cost
                best_params["w"] = w
                best_params["b"] = b
    print "best cost : %f"%best_cost 
    
    params = {"w" : w, "b" : b, "learning_rate" : learning_rate, "best_w" : best_params["w"], \
              "best_b" : best_params["b"]}
    grads = {"dw" : dw , "db" : db}
    return params, grads, costs

我們還需定義預測值計算函式，並判斷 $\hat{y}> 0.5$ 時，預測為1(cat)； $\hat{y}\leq 0.5$ 時，預測為0(non-cat)：

##prediction
##step1:calculate y_hat
##step2:1(y_hat>0.5),0(y_hat<=0.5)
def predict(w, b, x):
    y_hat = sigmoid_function(np.dot(w.T,x) + b)
    assert(y_hat.shape[1] == x.shape[1])
    y_pred = np.zeros((1,y_hat.shape[1]))
    for i in range(y_hat.shape[1]):
        if y_hat[:,i] <= 0.5:
            y_pred[:,i] = 0
        else:
            y_pred[:,i] = 1
    return y_pred

將函式包裝起來，計算正確率來評估識別效果：

我用最後一次迭代時的引數與最佳引數分別來計算正確值，並做了對比。得到的結果是兩者的正確率完全一致，沒有不同，我認為應該是資料集小、識別難度小的原因，所以即時不需要用最佳引數也可以達到高識別率的效果：

##(4)merge all functions into a model
def model(x_train, y_train, x_test, y_test, num_iterations = 2000, \
           learning_rate = 0.05):   
    features_num = x_train.shape[0]
    w, b = initiolize_with_zeros(features_num)
    params, grads, costs = optimize(w, b, x_train, y_train, \
                        num_iterations , learning_rate)
    w, b, learning_rate = params["w"],params["b"],params["learning_rate"]  
    y_pred_train = predict(w, b, x_train)
    y_pred_test = predict(w, b, x_test)    
    accuracy_train = 100 - np.mean(np.abs(y_pred_train - y_train) * 100)
    accuracy_test = 100 - np.mean(np.abs(y_pred_test - y_test) * 100)
    
    ##predict y_hat with best_params
    best_w, best_b = params["best_w"], params["best_b"]
    best_y_pred_train = predict(best_w, best_b, x_train)
    best_y_pred_test = predict(best_w, best_b, x_test)
    best_accuracy_train = 100 - np.mean(np.abs(best_y_pred_train - y_train) * 100)
    best_accuracy_test = 100 - np.mean(np.abs(best_y_pred_test - y_test) * 100)
    
    ##comparison between last w&b and best w&b
    print "learning_rate : %f"%learning_rate
    print "train accuracy -- %f%% : %f%%"%(accuracy_train,best_accuracy_train)
    print "test accuracy -- %f%% : %f%%"%(accuracy_test, best_accuracy_test)
    
    result = {"costs" : costs, "y_pred_test" : y_pred_test, \
              "y_pred_train" : y_pred_train, "w" : w, "b" : b, \
              "learning_rate" : learning_rate, "num_iterations" : num_iterations}
    return result

結果

總結

預處理很重要，不同的情況選擇哪種預處理方法會造成不同的結果，此實驗中我試了試計算資料集的2範數來進行歸一化，但結果不盡人意。
定義函式，構建神經網路
選擇不同的學習速率也會對結果產生不同的效果，當分別用{0.1, 0.01, 0.001}來訓練資料集時，得到的結果如下：

吳恩達視訊學習課後作業(01.神經網路和深度學習--week2/assignment2_2)

目的

主要步驟

檔案介紹

具體步驟

結果

總結

吳恩達視訊學習課後作業(01.神經網路和深度學習--week2/assignment2_2)

01神經網路和深度學習-Deep Neural Network for Image Classification: Application-第四周程式設計作業2

01.神經網路和深度學習

吳恩達神經網路和深度學習第4周程式設計作業

吳恩達DeepLearning.ai（神經網路和深度學習）第二週程式設計作業

吳恩達神經網路和深度學習第3周程式設計作業

Coursera deep learning 吳恩達神經網路和深度學習第四周程式設計作業 Building your Deep Neural Network

Coursera 深度學習 deep learning.ai 吳恩達神經網路和深度學習第一課第二週程式設計作業 Python Basics with Numpy

吳恩達第一門-神經網路和深度學習第二週6-10學習筆記

吳恩達第一門-神經網路和深度學習第三週6-10學習筆記

Coursera-吳恩達-深度學習-神經網路和深度學習-week1-測驗

吳恩達Coursera深度學習課程筆記（1-1）神經網路和深度學習-深度學習概論

Deeplearning.ai吳恩達筆記之神經網路和深度學習1

Deeplearning.ai吳恩達筆記之神經網路和深度學習3

Coursera 吳恩達 Deep Learning 第二課改善神經網路 Improving Deep Neural Networks 第二週程式設計作業程式碼Optimization methods

李飛飛、吳恩達、Bengio等人的15大頂級深度學習課程

【學習日記】吳恩達深度學習工程師微專業第一課：神經網路和深度學習

吳恩達《深度學習-神經網路和深度學習》4--深層神經網路

吳恩達 -- 第四課卷積神經網路第三週 Detection algorithms

吳恩達deeplearning之CNN—卷積神經網路入門

吳恩達視訊學習課後作業(01.神經網路和深度學習--week2/assignment2_2)

目的

主要步驟

檔案介紹

具體步驟

結果

總結

相關推薦