Planar data classification with one hidden layer

From Logistic Regression with a Neural Network mindset, we achieved the Neural Network which use Logistic Regression to resolve the linear classification . In this blog ,we will achieve a Neural Network with one hidden layer to resolve the no-linear classification as :

Which I will Code

Implement a 2-class classification neural network with a single hidden layer
Use units with a non-linear activation function, such as tanh
Compute the cross entropy loss
Implement forward and backward propagation

Defining the neural network structure

layer_size()

This function will define three variables:

n_x: the size of the input layer
n_h: the size of the hidden layer (set this to 4)
n_y: the size of the output layer

def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """ 

   
    n_x = X.shape[0] # size of input layer
    n_h = 4
    n_y = Y.shape[0]
    # size of output layer
    return (n_x, n_h, n_y)

Initialize the model’s parameters

initialize_parameters()

To make sure our parameters’ sizes are right. Refer to the neural network figure above if needed.
I will initialize the weights matrices with random values.
- Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
I will initialize the bias vectors as zeros.
- Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    np.random.seed(2) # we set up a seed so that our output matches ours although the initialization is random.
    
    W1 =  np.random.randn(n_h,n_x) * 0.01 
    b1 = np.zeros((n_h,1))
    W2 =  np.random.randn(n_y,n_h) * 0.01 
    b2 = np.zeros((n_y,1))

    
    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

The Loop

forward_propagation()

Step

Retrieve each parameter from the dictionary “parameters” (which is the output of initialize_parameters()) by using parameters[".."].
Implement Forward Propagation. Compute $Z^{[1]}, A^{[1]}, Z^{[2]}$ and $A^{[2]}$ (the vector of all our predictions on all the examples in the training set).
Values needed in the backpropagation are stored in “cache”. The cache will be given as an input to the backpropagation function.

Code

def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
   
    W1 = parameters['W1']
    b1 =  parameters['b1']
    W2 =  parameters['W2']
    b2 =  parameters['b2']
   
    
    # Implement Forward Propagation to calculate A2 (probabilities)
  
   
    Z1 = np.dot(W1,X)+b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2,A1)+b2
    A2 = sigmoid(Z2)
  
    
    assert(A2.shape == (1, X.shape[1]))
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

compute_cost()

Now that I have computed A^{[2]} (in the Python variable “A2”), which contains $a^{[2](i)}$ for every example, I can compute the cost function as follows:
$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{1}$

def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (1)
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2
    
    Returns:
    cost -- cross-entropy cost given equation (13)
    """
    
    m = Y.shape[1] # number of example

    # Compute the cross-entropy cost
    logprobs = np.multiply(Y,np.log(A2))+np.multiply((1-Y),np.log(1-A2))
    cost = -np.sum(logprobs)/m
    
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
    assert(isinstance(cost, float))
    
    return cost

backward propagation()

Backpropagation is usually the hardest (most mathematical) part in deep learning. Here is the slide from the lecture on backpropagation. I’ll want to use the six equations on the right of this slide, since I are building a vectorized implementation.

grad_summary

$\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)})$

$\frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T}$

$\frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}$

$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } = W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $

$\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } X^T $

$\frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}$

Note that $*$ denotes elementwise multiplication.
The notation I will use is common in deep learning coding:
- dW1 = $\frac{\partial \mathcal{J} }{ \partial W_1 }$
- db1 = $相關推薦 .r{ margin-bottom:10px; border-bottom:1px solid #f1f1f1; padding-bottom:10px;} .r p{ color:#999; line-height:25px;} .r h5 a{ font-size:16px; line-height:25px;} .r h5 a:hover{ color:#ff6600} Planar data classification with one hidden layer From Logistic Regression with a Neural Network mindset, we achieved the Neural Network which use Logistic Regression to resolve the linear class Neural Networks and Deep Learning（week3）Planar data classification with one hidden layer(基於單隱層的平面數據分類) one hid 線性 deep with ica ural 神經網絡二分 Planar data classification with one hidden layer 你會學習到如何：用單隱層實現一個二分類神經網絡使用一個非線性激勵函數，如吳恩達deeplearning作業-Planar data classification with one hidden layer Planar data classification with one hidden layer Welcome to your week 3 programming assignment. It’s time to build your first ne Ng第二課：單變量線性回歸(Linear Regression with One Variable) dll oba vcf 更多 dba cfq dpf gis avd 二、單變量線性回歸(Linear Regression with One Variable) 2.1 模型表示 2.2 代價函數 2.3 代價函數的直觀理解 2.4 梯度下降 One hundred layer HDU - 4374 spl second -i hdu 維護 ref first print targe One hundred layer HDU - 4374 $sum[i][j][k]$表示第i層第j到k列的和 $ans[i][j]$表示第i層最終停留在第j列的最大值，那麽顯然$an Data Analysis with Python : Exercise- Titantic Survivor Analysis | packtpub.com .com pub nal kaggle out conda anti vivo python kaggle-titantic, from: https://www.youtube.com/watch?v=siEPqQsPLKA install matplotlib: con SDP（0）：Streaming-Data-Processor - Data Processing with Akka-Stream 數據庫管理新的集成部分 ont lock 感覺 sharding 數據源再有兩天就進入2018了,想想還是要準備一下明年的工作方向。回想當初開始學習函數式編程時的主要目的是想設計一套標準API給那些習慣了OOP方式開發商業應用軟件的程序員們，使他們能用一種接近 ML:單變量線性回歸（Linear Regression With One Variable） one mod gre line lin 我們目的技術 ESS 模型表達（model regression）用於描述回歸問題的標記 m 訓練集（training set）中實例的數量 x 特征/輸入變量 y 目標變量/輸出變量 (x,y) 訓練集中的實例 (x( 閱讀筆記：ImageNet Classification with Deep Convolutional Neural Networks 時間 ica gpu ati 做了 alexnet 小數而且響應概要：本文中的Alexnet神經網絡在LSVRC-2010圖像分類比賽中得到了第一名和第五名，將120萬高分辨率的圖像分到1000不同的類別中，分類結果比以往的神經網絡的分類都要好。為了訓練更快，使用了中文版 ImageNet Classification with Deep Convolutional Neural Networks rest bat 幫助 lin 直接 cif 原始的出了 war ImageNet Classification with Deep Convolutional Neural Networks 摘要我們訓練了一個大型深度卷積神經網絡來將ImageNet LSVRC- Building Data Models with PowerPivot_進階篇2 5.1 使用 Userelationship 建立兩表之間的多個關係 USERELATIONSHIP(多端,一端) Measure_送貨數量 = CALCULATE(SUM([數量])),USERELATIONSHIP('銷售記錄'[實際送貨日期],'日曆年'[日期]) 5.2 Building Data Models with PowerPivot_進階篇 Building Data Models with PowerPivot_進階篇 2.3 使用連結回標進行RFM分析 R Recent近度 MIN([近度]); [近度]=TODAY()-[下單日期] 3.1 使用高階DAX函式高階聚合函式SUMX SUMX函式 (2012)ImageNet Classification with Deep CNN 3.1 ReLU Nonlinearity使用relu作為啟用函式要比sigmoid和tanh(雙曲正切)函式收斂的快 3.3 區域性響應歸一化對區域性神經元的活動建立競爭機制，使響應大的變得更大，並抑制其他反饋較小的神經元，可以降低top1 top5的錯誤率 3.4 重疊池化池化的邊界相互重疊，可以降低 Change the default MySQL data directory with SELinux enabled 轉載：https://rmohan.com/?p=4605 Change the default MySQL data directory with SELinux enabled This is a short article that explains how you Linear regression with one variable [ml] Model representation 就是得到輸入到輸出的hypothesis（假設）函式，也就是得出學習演算法。比如通過房子大小（輸入）預測房價（輸出）。 cost function（代價函式）線性迴歸，就是得到一個線性函式，就機器學習（二）--------單變數線性迴歸(Linear Regression with One Variable) 面積與房價訓練集（Training Set） Size Price 2104 460 852 DataCamp Data Scientist with Python track 學習筆記 Importing Data in Python: Customizing your pandas import: # Import matplotlib.pyplot as plt import matplotlib.pyplot as plt # Modern Data Lake with Minio : Part 1 轉自：https://blog.minio.io/modern-data-lake-with-minio-part-1-716a49499533 Modern data lakes are now built on cloud storage, helping organizations lever Modern Data Lake with Minio : Part 2 轉自： https://blog.minio.io/modern-data-lake-with-minio-part-2-f24fb5f82424 In the first part of this series, we saw why object storage systems like Min Soft-NMS: Improving object detection with one line of code Improving object detection with one line of code 是ICCV2017的文章，主要是優化解決目標檢測後處理中非極大值抑制（NMS，Non Maximum Suppression）的問題。 NMS：在解析本文主治之前，先回顧下$