DL-1: Tips for Training Deep Neural Network

阿新 • • 發佈：2019-02-01

Different approaches for different problems.

e.g. dropout for good results on testing data.

Choosing proper loss

Square Error

∑i=1n(yi−yiˆ)2

Cross Entropy

−∑i=1nyiˆlnyi

Mini-batch

We do not really minimize total loss!

batch_size: 每次批處理訓練樣本個數；
nb_epoch: 整個訓練資料重複處理次數。
總的訓練樣本數量不變。

Mini-batch is Faster. Not always true with parallel computing.

Mini-batch has better performance!

Shuffle the training examples for each epoch. This is the default of Keras.

New activation function

Q: Vanishing Gradient Problem

Smaller gradients
Learn very slow
Almost random
Larger gradients
Learn very fast
Already converge

2006 RBM –> 2015 ReLU

ReLU: Rectified Linear Unit
1. Fast to compute
2. Biological reason
3. Infinite sigmoid with different biases
4. Vanishing gradient problem

A Thinner linear network. Do not have smaller gradients.

ReLU

ReLU1

ReLU2

ReLU3

Adaptive Learning Rate

Set the learning rate η

carefully.

If learning rate is too large, total loss may not decrease after each update.
If learning rate is too small, training would be too slow.

Solution:

Popular & Simple Idea: Reduce the learning rate by some factor every few epochs.
- At the beginning, use larger learning rate
- After several epochs, reduce the learning rate. E.g. 1/t decay: ηt=η/t+1−−−−√
Learning rate cannot be one-size-fits-all.
- Giving different parameters different learning rates

Adagrad: w=w−ηw∂L/∂w
ηw: Parameter dependent learning rate.

ηw=η∑ti=0(gi)2

η: constant
gi: is ∂L/∂w obtained at the i-th update.

Summation of the square of the previous derivatives.

Observation:
1. Learning rate is smaller and smaller for all parameters.
2. Smaller derivatives, larger learning rate, and vice versa.

Adagrad [John Duchi, JMLR’11]
RMSprop
https://www.youtube.com/watch?v=O3sxAc4hxZU
Adadelta [Matthew D. Zeiler, arXiv’12]
“No more pesky learning rates” [Tom Schaul, arXiv’12]
AdaSecant [Caglar Gulcehre, arXiv’14]
Adam [DiederikP. Kingma, ICLR’15]
Nadam
http://cs229.stanford.edu/proj2015/054_report.pdf

Momentum

Momentum1

Overfitting

Learning target is defined by the training data.
Training data and testing data can be different.
The parameters achieving the learning target do not necessary have good results on the testing data.
Panacea for Overfitting
- Have more training data
- Create more training data

Early Stopping

Regularization

Weight decay is one kind of regularization.

Dropout

Training

Each time before updating the parameters
1. Each neuron has p% to dropout
  
  The structure of the network is changed.
2. Using the new network for training
  For each mini-batch, we resample the dropout neurons.

Testing

**No dropout**

If the dropout rate at training is p%, all the weights times (1-p)%
Assume that the dropout rate is 50%.

If a weight w = 1 by training, set w = 0.5 for testing.

Dropout -Intuitive Reason

When teams up, if everyone expect the partner will do the work, nothing will be done finally.
However, if you know your partner will dropout, you will do better.
When testing, no one dropout actually, so obtaining good results eventually.

Dropout is a kind of ensemble

dropout1

dropout2

dropout3

dropout4

dropout5

Network Structure

CNN is a very good example!

參考

DL-1: Tips for Training Deep Neural Network

Different approaches for different problems. e.g. dropout for good results on testing data. Choosing proper loss Square Error

2018/12/14 Deep Neural Network Training(1)

Loss Function and Optimization 損失函式如何優化線性分類器損失函式是量化的評估線性分類器的標準。損失函式是優化的目標。損失函式的定義：當初始化W很小的時候，S–>0,此時L–>c-1（其中c代表類的個數）

Deep Neural Network for Image Classification: Application

cal pack 分享圖片 his exp params next min super When you finish this, you will have finished the last programming assignment of Week 4, and a

論文閱讀筆記十八：ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

每一個內核基於 proc vgg 包含 rep 重要偏差論文源址：https://arxiv.org/abs/1606.02147 tensorflow github: https://github.com/kwotsin/TensorFlow-ENet 摘要

01神經網路和深度學習-Deep Neural Network for Image Classification: Application-第四周程式設計作業2

一、兩層神經網路模型：LINEAR->RELU->LINEAR->SIGMOID #coding=utf-8 import time import numpy as np import h5py import matplotlib.pyplot as

第四周程式設計作業（二）-Deep Neural Network for Image Classification: Application

Deep Neural Network for Image Classification: Application When you finish this, you will have finished the last programming assignment of Week 4

吳恩達深度學習1-4課後作業1 Building your Deep Neural Network: Step by Step

2 - Outline of the Assignment To build your neural network, you will be implementing several "helper functions". These helper functions will be used i

Deep Neural Network for Image Classification:Application

上一篇文章中實現了一個兩層神經網路和L層神經網路需要用到的函式本篇我們利用這些函式來實現一個深層神經網路來實現圖片的分類 1.首先是匯入需要的包 import time import numpy as np import h5py import matplotlib.p

DeepEyes: 用於深度神經網絡設計的遞進式可視分析系統 (DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks)

失誤 min 使用包括系統所有訓練如果 blog 深度神經網絡，在模式識別問題上，取得非常不錯的效果。但設計一個性能好的神經網絡，需要反復嘗試，是個非常耗時的過程。這個工作[1]實現了用於深度神經網絡設計的可視分析系統，DeepEyes。該系統可以在DNNs訓練過

Building your Deep Neural Network: Step by Step¶

pan auto plot chan arr src computing zeros rect Welcome to your week 4 assignment (part 1 of 2)! You have previously trained a 2-layer N

A Bayesian Approach to Deep Neural Network Adaptation with Applications to Robust Automatic Speech Recognition

機器學習屬於瓶頸特征 oid ack enter 變換表示基於貝葉斯的深度神經網絡自適應及其在魯棒自動語音識別中的應用直接貝葉斯DNN自適應使用高斯先驗對DNN進行MAP自適應為何貝葉斯在模型自適應中很有用？因為自適應問題可以視為後驗估計

Debugging & Visualising training of Neural Network with TensorBoard

Debugging & Visualising training of Neural Network with TensorBoard Introduction I started my deep learning journey a few years ba

論文閱讀筆記二十九：One pixel attack for fooling deep neural networks（CVPR2017）

論文源址：https://arxiv.org/abs/1710.08864 tensorflow程式碼: https://github.com/Hyperparticle/one-pixel-attack-keras 摘要

機器學習筆記~Practical Advice for Building Deep Neural Networks by Matt H and Daniel R

這是一篇從外文網站轉載的一篇關於構建深度神經網路時的建議，在進行網路搭建時可以適當借鑑和參考。 Practical Advice for Building Deep Neural Networks In our machine learning lab, we’ve

Deep Neural Network Compression by In-Parallel Pruning-Quantization 論文筆記

乘法搜索 ron 級別結合們的記憶加權共享摘要深度神經網絡在視覺識別任務（如圖像分類和物體檢測）上實現了最先進的精確度。然而，現代網絡包含數百萬個已學習的連接，並且當前的趨勢是朝向更深和更密集連接的體系結構。這對在資源受限的系統（例如智能手機或移動機器人）上

CNN與為什麼要做DNN（Deep neural network)(李弘毅機器學習）

CNN整體過程 1.整體架構卷積操作（convolution):可以進行卷積操作是因為對於影象而言，有些部分割槽域要比整個影象更加重要。並且相同的部分會出現在不同的區域，我們使用卷積操作可以降低成本。比如，我們識別鳥，鳥嘴部分的資訊很重要，通過這個鳥嘴，我

Simple Reference Guide for tuning Deep Neural Nets

Simple Reference Guide for tuning Deep Neural NetsGetting StartedDesigning deep neural nets can be a painful task considering so many parameters involved a

神經網路與深度學習第四周-Building your Deep Neural Network

Building your Deep Neural Network: Step by StepWelcome to your week 4 assignment (part 1 of 2)! You have previously trained a 2-layer Neur

Multi-View Gait Recognition Based on A Spatial-Temporal Deep Neural Network論文翻譯和理解

Multi-View Gait Recognition Based on A Spatial-Temporal Deep Neural Network論文翻譯和理解翻譯格式：一句英文，一句中文結合圖來講解 ABSTRACT ABSTRACT This paper p

李巨集毅機器學習 P18 Tips for Training DNN 筆記

假如deep learning得到不好的結果，應該從哪個方向進行改進呢？首先檢查neural network在training data上是否得到好的結果。如果在training data上得到好的結果，而在testing data上沒有得到好的結果，那麼這種

DL-1: Tips for Training Deep Neural Network

Choosing proper loss

Mini-batch

New activation function

Adaptive Learning Rate

Momentum

Overfitting

Early Stopping

Regularization

Dropout

Training

Testing

Dropout -Intuitive Reason

Dropout is a kind of ensemble

Network Structure

參考

相關推薦