Planar data classification with one hidden layer
From Logistic Regression with a Neural Network mindset, we achieved the Neural Network which use Logistic Regression to resolve the linear classification . In this blog ,we will achieve a Neural Network with one hidden layer to resolve the no-linear classification as :
Which I will Code
-
Implement a 2-class classification neural network with a single hidden layer
-
Use units with a non-linear activation function, such as tanh
-
Compute the cross entropy loss
-
Implement forward and backward propagation
Defining the neural network structure
layer_size()
This function will define three variables:
-
n_x: the size of the input layer
-
n_h: the size of the hidden layer (set this to 4)
-
n_y: the size of the output layer
def layer_sizes(X, Y):
"""
Arguments:
X -- input dataset of shape (input size, number of examples)
Y -- labels of shape (output size, number of examples)
Returns:
n_x -- the size of the input layer
n_h -- the size of the hidden layer
n_y -- the size of the output layer
"""
n_x = X.shape[0] # size of input layer
n_h = 4
n_y = Y.shape[0]
# size of output layer
return (n_x, n_h, n_y)
Initialize the model’s parameters
initialize_parameters()
-
To make sure our parameters’ sizes are right. Refer to the neural network figure above if needed.
-
I will initialize the weights matrices with random values.
- Use:
np.random.randn(a,b) * 0.01
to randomly initialize a matrix of shape (a,b).
- Use:
-
I will initialize the bias vectors as zeros.
- Use:
np.zeros((a,b))
to initialize a matrix of shape (a,b) with zeros.
- Use:
def initialize_parameters(n_x, n_h, n_y):
"""
Argument:
n_x -- size of the input layer
n_h -- size of the hidden layer
n_y -- size of the output layer
Returns:
params -- python dictionary containing your parameters:
W1 -- weight matrix of shape (n_h, n_x)
b1 -- bias vector of shape (n_h, 1)
W2 -- weight matrix of shape (n_y, n_h)
b2 -- bias vector of shape (n_y, 1)
"""
np.random.seed(2) # we set up a seed so that our output matches ours although the initialization is random.
W1 = np.random.randn(n_h,n_x) * 0.01
b1 = np.zeros((n_h,1))
W2 = np.random.randn(n_y,n_h) * 0.01
b2 = np.zeros((n_y,1))
assert (W1.shape == (n_h, n_x))
assert (b1.shape == (n_h, 1))
assert (W2.shape == (n_y, n_h))
assert (b2.shape == (n_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
The Loop
forward_propagation()
Step
-
Retrieve each parameter from the dictionary “parameters” (which is the output of
initialize_parameters()
) by usingparameters[".."]
. -
Implement Forward Propagation. Compute and (the vector of all our predictions on all the examples in the training set).
-
Values needed in the backpropagation are stored in “
cache
”. Thecache
will be given as an input to the backpropagation function.
Code
def forward_propagation(X, parameters):
"""
Argument:
X -- input data of size (n_x, m)
parameters -- python dictionary containing your parameters (output of initialization function)
Returns:
A2 -- The sigmoid output of the second activation
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
"""
# Retrieve each parameter from the dictionary "parameters"
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
# Implement Forward Propagation to calculate A2 (probabilities)
Z1 = np.dot(W1,X)+b1
A1 = np.tanh(Z1)
Z2 = np.dot(W2,A1)+b2
A2 = sigmoid(Z2)
assert(A2.shape == (1, X.shape[1]))
cache = {"Z1": Z1,
"A1": A1,
"Z2": Z2,
"A2": A2}
return A2, cache
compute_cost()
Now that I have computed A^{[2]} (in the Python variable “A2
”), which contains
for every example, I can compute the cost function as follows:
def compute_cost(A2, Y, parameters):
"""
Computes the cross-entropy cost given in equation (1)
Arguments:
A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
parameters -- python dictionary containing your parameters W1, b1, W2 and b2
Returns:
cost -- cross-entropy cost given equation (13)
"""
m = Y.shape[1] # number of example
# Compute the cross-entropy cost
logprobs = np.multiply(Y,np.log(A2))+np.multiply((1-Y),np.log(1-A2))
cost = -np.sum(logprobs)/m
cost = np.squeeze(cost) # makes sure cost is the dimension we expect.
# E.g., turns [[17]] into 17
assert(isinstance(cost, float))
return cost
backward propagation()
Backpropagation is usually the hardest (most mathematical) part in deep learning. Here is the slide from the lecture on backpropagation. I’ll want to use the six equations on the right of this slide, since I are building a vectorized implementation.
$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } = W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $
$\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } X^T $
-
Note that denotes elementwise multiplication.
-
The notation I will use is common in deep learning coding:
- dW1 =
- db1 =