Neural networks and backpropagation explained in a simple way

阿新 • • 發佈：2018-12-29

Step 4- Differentiation

Obviously we can use any optimisation technique that modifies the internal weights of neural networks in order to minimise the total loss function that we previously defined. These techniques can include genetic algorithms or greedy search or even a simple brute-force search:In our simple numerical example, with only one parameter of weight to optimize W

, we can search from -1000.0 to +1000.0 step 0.001, which W has the smallest sum of squares of errors over the dataset.

This might works if the model has only very few parameters and we don’t care much about precision. However, if we are training the NN over an array of 600x400 inputs (like in image processing), we can reach very easily models with millions of weights to optimise and brute force can’t be even be imaginable, since it’s a pure waste of computational resources!

Luckily for us, there is a powerful concept in mathematics that can guide us how to optimise the weights called differentiation. Basically it deals with the derivative of the loss function. In mathematics, the derivative of a function at a certain point, gives the rate or the speed of which this function is changing its values at this point.

In order to see the effect of the derivative, we can ask ourselves the following question: how much the total error will change if we change the internal weight of the neural network with a certain small value δW. For the sake of simplicity will consider δW=0.0001. in reality it should be much smaller!.

Let’s recalculate the sum of the squares of errors when the weight W changes very slightly:

+--------+----------+-------+-----------+------------+---------+| Input  |  Output  |  W=3  |  rmse(3)  |  W=3.0001  |   rmse  |+--------+----------+-------+-----------+------------+---------+| 0      |       0  |    0  |        0  |         0  |       0 || 1      |       2  |    3  |        1  |    3.0001  |  1.0002 || 2      |       4  |    6  |        4  |    6.0002  |  4.0008 || 3      |       6  |    9  |        9  |    9.0003  |  9.0018 || 4      |       8  |   12  |       16  |   12.0004  | 16.0032 || Total: |        - |     - |       30  |         -  |  30.006 |+--------+----------+-------+-----------+------------+---------+

Now as we can see from this table, if we increase W from 3 to 3.0001, the sum of squares of error will increase from 30 to 30.006. Since we know that the best function that fits this model is y=2.x, increasing the weights from 3 to 3.0001 should obviously create a little bit more error (because we are going further from the intuitive correct weight of 2. 3.0001 > 3 > 2 thus the error is higher) But what we really care about is the rate of which the error changes relatively to the changes on the weight.Basically here this rate is the increase of 0.006 in the total error for each 0.0001 increasing weight -> that’s a rate of 0.006/0.0001 = 60x!It works in both direction, so basically if we decrease the weights by 0.0001, we should be able to decrease the total error by 0.006 as well!Here is the proof, if you run again the calculation, at W=2.9999 you get an error of 29.994. We managed to decrease the total error!

We could have guessed this rate by calculating directly the derivative of the loss function.The advantage of using the mathematical derivative is that it is much faster and more precise to calculate (less floating point precision problems).

Here is what our loss function looks like:

If w=2, we have a loss of 0, since the neural network actual output will fit perfectly the training set.
If w<2, we have a positive loss function, but the derivative is negative, meaning that an increase of weight will decrease the loss function.
At w=2, the loss is 0 and the derivative is 0, we reached a perfect model, nothing is needed.
If w>2, the loss becomes positive again, but the derivative is as well positive, meaning that any more increase in the weight, will increase the losses even more!!

If we initialise randomly the network, we are putting any random point on this curve (let’s say w=3) . The learning process is actually saying this:

- Let’s check the derivative.- If it is positive, meaning the error increases if we increase the weights, then we should decrease the weight.- If it’s negative, meaning the error decreases if we increase the weights, then we should increase the weight.- If it’s 0, we do nothing, we reach our stable point.

In a simple matter, we are designing a process that acts like gravity. No matter where we randomly initialise the ball on this error function curve, there is a kind of force field that drives the ball back to the lowest energy level of ground 0.

Neural networks and backpropagation explained in a simple way

Step 4- Differentiation

Neural networks and backpropagation explained in a simple way

A Beginner's Guide to Neural Networks and Deep Learning

譯：《Dropout: A Simple Way to Prevent Neural Networks from Overﬁtting》

【論文精讀】Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Mastering the game of Go with deep neural networks and tree search

Neural Networks and Deep Learning學習筆記ch1 - 神經網絡

課程一(Neural Networks and Deep Learning)總結：Logistic Regression

第四節，Neural Networks and Deep Learning 一書小節(上)

translation of the paper sequence and structure conservation in a protein core

【DeepLearning學習筆記】Coursera課程《Neural Networks and Deep Learning》——Week1 Introduction to deep learning課堂筆記

【DeepLearning學習筆記】Coursera課程《Neural Networks and Deep Learning》——Week2 Neural Networks Basics課堂筆記

課程一(Neural Networks and Deep Learning)，第一週（Introduction to Deep Learning）—— 0、學習目標

課程一(Neural Networks and Deep Learning)，第二週（Basics of Neural Network programming）—— 1、10個測驗題（Neural N

Neural Networks and Deep Learning

Ranking with Recursive Neural Networks and Its Application to Multi-document Summarization

課程一(Neural Networks and Deep Learning)，第一週（Introduction to Deep Learning）—— 2、10個測驗題

sp1.1-1.2 Neural Networks and Deep Learning

sp1.3-1.4 Neural Networks and Deep Learning

Neural networks and deep learning 概覽

論文筆記12:Building Adaptive Tutoring Model using Artificial Neural Networks and Reinforcement Learning

Neural networks and backpropagation explained in a simple way

Step 4- Differentiation

相關推薦