1. 程式人生 > >Planar data classification with one hidden layer

Planar data classification with one hidden layer

From Logistic Regression with a Neural Network mindset, we achieved the Neural Network which use Logistic Regression to resolve the linear classification . In this blog ,we will achieve a Neural Network with one hidden layer to resolve the no-linear classification as :

image-20181113084012793

Which I will Code

  • Implement a 2-class classification neural network with a single hidden layer

  • Use units with a non-linear activation function, such as tanh

  • Compute the cross entropy loss

  • Implement forward and backward propagation

Defining the neural network structure

layer_size()

This function will define three variables:

  • n_x: the size of the input layer

  • n_h: the size of the hidden layer (set this to 4)

  • n_y: the size of the output layer

def layer_sizes(X, Y):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """
n_x = X.shape[0] # size of input layer n_h = 4 n_y = Y.shape[0] # size of output layer return (n_x, n_h, n_y)

Initialize the model’s parameters

initialize_parameters()

  • To make sure our parameters’ sizes are right. Refer to the neural network figure above if needed.

  • I will initialize the weights matrices with random values.

    • Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
  • I will initialize the bias vectors as zeros.

    • Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.
def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    np.random.seed(2) # we set up a seed so that our output matches ours although the initialization is random.
    
    W1 =  np.random.randn(n_h,n_x) * 0.01 
    b1 = np.zeros((n_h,1))
    W2 =  np.random.randn(n_y,n_h) * 0.01 
    b2 = np.zeros((n_y,1))

    
    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

The Loop

forward_propagation()

Step

  1. Retrieve each parameter from the dictionary “parameters” (which is the output of initialize_parameters()) by using parameters[".."].

  2. Implement Forward Propagation. Compute Z [ 1 ] , A [ 1 ] , Z [ 2 ] Z^{[1]}, A^{[1]}, Z^{[2]} and A [ 2 ] A^{[2]} (the vector of all our predictions on all the examples in the training set).

  3. Values needed in the backpropagation are stored in “cache”. The cache will be given as an input to the backpropagation function.

Code

def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
   
    W1 = parameters['W1']
    b1 =  parameters['b1']
    W2 =  parameters['W2']
    b2 =  parameters['b2']
   
    
    # Implement Forward Propagation to calculate A2 (probabilities)
  
   
    Z1 = np.dot(W1,X)+b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2,A1)+b2
    A2 = sigmoid(Z2)
  
    
    assert(A2.shape == (1, X.shape[1]))
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache

compute_cost()

Now that I have computed A^{[2]} (in the Python variable “A2”), which contains a [ 2 ] ( i ) a^{[2](i)} for every example, I can compute the cost function as follows:
(1) J = 1 m i = 0 m ( y ( i ) log ( a [ 2 ] ( i ) ) + ( 1 y ( i ) ) log ( 1 a [ 2 ] ( i ) ) ) J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{1}

def compute_cost(A2, Y, parameters):
    """
    Computes the cross-entropy cost given in equation (1)
    
    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    parameters -- python dictionary containing your parameters W1, b1, W2 and b2
    
    Returns:
    cost -- cross-entropy cost given equation (13)
    """
    
    m = Y.shape[1] # number of example

    # Compute the cross-entropy cost
    logprobs = np.multiply(Y,np.log(A2))+np.multiply((1-Y),np.log(1-A2))
    cost = -np.sum(logprobs)/m
    
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
    assert(isinstance(cost, float))
    
    return cost

backward propagation()

Backpropagation is usually the hardest (most mathematical) part in deep learning. Here is the slide from the lecture on backpropagation. I’ll want to use the six equations on the right of this slide, since I are building a vectorized implementation.

grad_summary

J z 2 ( i ) = 1 m ( a [ 2 ] ( i ) y ( i ) ) \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)})

J W 2 = J z 2 ( i ) a [ 1 ] ( i ) T \frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T}

J b 2 = i J z 2 ( i ) \frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}

$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } = W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $

$\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } X^T $

J i b 1 = i J z 1 ( i ) \frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}