1. 程式人生 > >改善深層深度網路 第一週 2

改善深層深度網路 第一週 2

Observations:

  • The value of λλ is a hyperparameter that you can tune using a dev set.
  • L2 regularization makes your decision boundary smoother. If λλ is too large, it is also possible to "oversmooth", resulting in a model with high bias.

What is L2-regularization actually doing?:

L2-regularization relies on the assumption that a model with small weights is simpler than a model with large weights. Thus, by penalizing the square values of the weights in the cost function you drive all the weights to smaller values. It becomes too costly for the cost to have large weights! This leads to a smoother model in which the output changes more slowly as the input changes.

What you should remember -- the implications of L2-regularization on:

The cost computation:
  • A regularization term is added to the cost
The backpropagation function:
  • There are extra terms in the gradients with respect to weight matrices
Weights end up smaller ("weight decay"):
  • Weights are pushed to smaller values.