1. 程式人生 > >deeplearning.ai課程知識點整理

deeplearning.ai課程知識點整理

Neural Networks and Deep Learning

Introduction to deep learning

Neural Networks Basics

Logistic Regression as a Neural Network

Computation graph

神經網路的計算過程由正向傳播(forward propagation )來進行前向計算來計算神經網路的輸出以及反向傳播(back propagation )計算來計算梯度(gradients)或微分(derivatives)。計算圖(computation graph)解釋了為什麼以這種方式來組織。
這裡寫圖片描述


計算圖(computation graph)可以方便地展示神經網路的分層計算過程。

Shallow neural networks

Learn to build a neural network with one hidden layer, using forward propagation and backpropagation.

Learning Objectives

  • Understand hidden units and hidden layers
  • Be able to apply a variety of activation functions in a neural network.
  • Build your first forward and backward propagation with a hidden layer
  • Apply random initialization to your neural network
  • Become fluent with Deep Learning notations and Neural Network Representations
  • Build and train a neural network with one hidden layer.

Activation functions

Pros and cons of activation functions


Sigmoid和tanh函式的 缺點之一是 如果Z的值非常大或者非常小 那麼關於這個函式導數的梯度或者 斜率會變的很小 當Z很大或者很小的時候 函式的斜率值 會接近零 這會使得梯度下降變的緩慢。

ReLU是目前廣泛被人們使用的一個方法 雖然有時候 人們也會使用雙曲函式 作為啟用函式 ReLU的缺點之一是 當z為負數的時候 其導數為0,但在實際應用中並不是問題。ReLU和Leaky ReLU的共有的優勢是 在z的數值空間裡面 啟用函式的導數或者說 啟用函式的斜率 離0比較遠 因此在實踐當中使用普通的 ReLU啟用函式的話 那麼神經網路的學習速度通常會比使用 雙曲函式tanh或者Sigmoid函式來的更快 主要原因是使學習變慢的斜率趨向0的現象 變少了 啟用函式的導數趨向於0會降低學習的速度 我們知道,一半z的數值範圍 ReLU的斜率為0 但是在實際使用中 大多數的隱藏單元的z值 將會大於0,因此學習仍然可以很快 。

這裡寫圖片描述

不同啟用函式的優缺點:

  • 不要使用Sigmoid啟用函式,除非在輸出層上並且你要解決的是二分類問題
  • tanh函式相比Sigmoid要好很多
  • 預設的最經常使用的啟用函式是ReLU函式

Why do you need non-linear activation functions?

如果使用線性啟用函式 或者叫恆等啟用函式 那麼神經網路的輸出 僅僅是輸入函式的線性變化。

如果使用線性啟用函式 或者說 沒有使用啟用函式 那麼無論你的神經網路有多少層 它所做的僅僅是計算線性啟用函式 這還不如去除所有隱藏層。線性的隱藏層沒有任何用處 因為兩個線性函式的組合 仍然是線性函式 除非你在這裡引入非線性函式 否則無論神經網路模型包含多少隱藏層 都無法實現更有趣的功能 只有一個地方會使用線性啟用函式 當g(z)等於z 就是使用機器學習解決迴歸問題的時候。

Deep Neural Networks

Improving Deep Neural Networks

About this Course
This course will teach you the “magic” of getting deep learning to work well. Rather than the deep learning process being a black box, you will understand what drives performance, and be able to more systematically get good results. You will also learn TensorFlow.

After 3 weeks, you will:

  • Understand industry best-practices for building deep learning applications.
  • Be able to effectively use the common neural network “tricks”, including initialization, L2 and dropout regularization, Batch normalization, gradient checking,
  • Be able to implement and apply a variety of optimization algorithms, such as mini-batch gradient descent, Momentum, RMSprop and Adam, and check for their convergence.
  • Understand new best-practices for the deep learning era of how to set up train/dev/test sets and analyze bias/variance
  • Be able to implement a neural network in TensorFlow.

This is the second course of the Deep Learning Specialization.

Practical aspects of Deep Learning

Learning Objectives
Recall that different types of initializations lead to different results
Recognize the importance of initialization in complex neural networks.
Recognize the difference between train/dev/test sets
Diagnose the bias and variance issues in your model
Learn when and how to use regularization methods such as dropout or L2 regularization.
Understand experimental issues in deep learning such as Vanishing or Exploding gradients and learn how to deal with them
Use gradient checking to verify the correctness of your backpropagation implementation

Regularizing your neural network


What we want you to remember from this module:
- Regularization will help you reduce overfitting.
- Regularization will drive your weights to lower values.
- L2 regularization and Dropout are two very effective regularization techniques.

Regularization

Deep Learning models have so much flexibility and capacity that overfitting can be a serious problem, if the training dataset is not big enough. Sure it does well on the training set, but the learned network doesn’t generalize to new examples that it has never seen!

The standard way to avoid overfitting is called L2 regularization. It consists of appropriately modifying your cost function, from:

(1)J=1mi=1m(y(i)log(a[L](i))+(1y(i))log(1a[L](i)))
To:
(2)Jregularized=1mi=1m(y(i)log(a[L](i))+(1y(i))log(1a[L](i)))cross-entropy cost+1mλ2lkjWk,j[l]2L2 regularization cost

為什麼只對引數w進行正則化呢? 為什麼我們不把b的相關項也加進去呢? 實際上也可以這樣做 但通常會把它省略掉 因為w往往是一個非常高維的引數向量 尤其是在發生高方差問題的情況下 可能w有非常多的引數 模型沒能很好地擬合所有的引數 而b只是單個數字 幾乎所有的引數都集中在w中 而不是b中 即使加上了最後這一項 實際上也不會起到太大的作用 因為b只是大量引數中的一個引數 在實踐中通常就不費力氣去包含它了 但如果想的話也可以(包含b)

L1 regulazation VS L2 regulazation
這裡寫圖片描述

L1正則化 即不使用L2範數(Euclid範數(歐幾里得範數,常用計算向量長度),即向量元素絕對值的平方和再開方) 而是使用lambda/m乘以這一項的和 這稱為引數向量w的L1範數(即向量元素絕對值之和) 這裡有一個數字1的小角標 無論你在分母中使用m還是2m 它只是一個縮放常量 如果你使用L1正則化 w最後會變得稀疏 這意味著w向量中有很多0 有些人認為這有助於壓縮模型 因為有一部分引數是0 只需較少的記憶體來儲存模型 然而在實踐中發現 通過L1正則化讓模型變得稀疏 帶來的收效甚微 所以至少在壓縮模型的目標上 它的作用不大

Why regularization reduces overfitting?

L2-regularization relies on the assumption that a model with small weights is simpler than a model with large weights. Thus, by penalizing the square values of the weights in the cost function you drive all the weights to smaller values. It becomes too costly for the cost to have large weights! This leads to a smoother model in which the output changes more slowly as the input changes.

Dropout Regularization

Dropout is a widely used regularization technique that is specific to deep learning.
It randomly shuts down some neurons in each iteration. Watch these two videos to see what this means!



Figure 2 : Drop-out on the second hidden layer.
At each iteration, you shut down (= set to zero) each neuron of a layer with probability