Training Very Deep Networks論文筆記

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.

摘要
理論和實證證據表明，神經網路的深度對其效能至關重要。然而，隨著深度的增加，訓練變得更加困難，這對於深度網路的訓練來說仍然是一個懸而未決的問題。在這裡，我們介紹一種旨在克服這一點的新架構。我們稱其為高速公路網路，此網路允許資訊在高速公路網路的多層中暢通無阻。這個網路是受到LSTM的啟發，並使用自適應門控單元來調節資訊流。即使有數百層，也可以通過簡單的梯度下降直接訓練高速公路網路。這使得研究極其深入和高效的架構成為可能。

2 Highway Networks
Notation We use boldface letters for vectors and matrices, and italicized capital letters to denote transformation functions. 0 and 1 denote vectors of zeros and ones respectively, and I denotes an identity matrix. The function σ(x) is defined as $σ$

( x ) = 1 1 + e

− x \sigma \left ( x \right )=\frac{1}{1+e^{-x}}

σ (x) = \frac{1}{1 + e ^{- x}}

;

x\epsilon R

. The dot operator (·) is used to denote element-wise multiplication.
A plain feedforward neural network typically consists of L layers where the

l^{th}

layer (

l\epsilon \left \{ 1,2,...,L \right \}

) applies a non-linear transformation H (parameterized by

W_{H,l}

) on its input

x_{l}

to produce its output

y_{l}

. Thus,

x_{1}

is the input to the network and

y_{L}

is the network’s output. Omitting the layer index and biases for clarity,

y=H\left ( x,W_{H} \right )

(1)
H is usually an affine transform followed by a non-linear activation function, but in general it may take other forms, possibly convolutional or recurrent. For a highway network, we additionally define two non-linear transforms

T\left ( x,W_{T} \right )

and

C\left ( x,W_{C} \right )

such that

y=H\left ( x,W_{H} \right ) \cdot T\left ( x,W_{T} \right )+x\cdot C\left ( x,W_{C} \right )

(2)
We refer to T as the transform gate and C as the carry gate, since they express how much of the output is produced by transforming the input and carrying it, respectively. For simplicity, in this paper we set C = 1 − T, giving

y=H\left ( x,W_{H} \right ) \cdot T\left ( x,W_{T} \right )+x\cdot (1-T\left ( x,W_{T} \right ))

(3)
The dimensionality of x; y;

H\left ( x,W_{H} \right )

and

T\left ( x,W_{T} \right )

must be the same for Equation 3 to be valid.
Note that this layer transformation is much more flexible than Equation 1. In particular, observe that for particular values of T,

y=\left\{\begin{matrix} x,&amp; ifT\left ( x,W_{T} \right )= 0\\ H\left ( x,W_{H} \right),&amp; if T\left ( x,W_{T} \right )=1 \end{matrix}\right.

(4)

Similarly, for the Jacobian of the layer transform,
$\frac{dy}{dx}=\left\{\begin{matrix} I& if T\left ( x,W_{T} \right )=0\\ H^{'}\left ( x,W_{H} \right ) & if T\left ( x,W_{T} \right )=1 \end{matrix}\right.$ (5)
Thus, depending on the output of the transform gates, a highway layer can smoothly vary its behavior between that of H and that of a layer which simply passes its inputs through. Just as a plain layer consists of multiple computing units such that the $i^{th}$ unit computes $y_{i}=H_{i}(x)$ , a highway network consists of multiple blocks such that the $i^{th}$

Training Very Deep Networks論文筆記

Abstract Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes

Highway Networks (Training Very Deep Networks, 2015 NIPS)

Reference csdn Deep Residual Learning for Image Recognition (ResNet 2016 CVPR) Densely Connected Convolutional Networks (DenseN

[CVPR 2016] Weakly Supervised Deep Detection Networks論文筆記

del found score feature 圖片 http spl span 根據 p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px "Helvetica Neue"; color: #323333 } p.p2

Deep Learning讀書筆記（三）：Greedy Layer-Wise Training of Deep Networks

接下來我們來說明下本篇文章的另一個主要工作，就是處理分類目標與輸入資料的分佈並沒有太大關聯的情況。問題的描述是這樣的，一個分類任務，輸入資料x服從分佈p（x），而分類目標可以表示為y=f(x)+noise，其中p與f並沒有特別明顯的關係。在這種設定下，我們並不能指望無監督學習對模型的學習有特別

Deep Learning論文筆記之（二）Sparse Filtering稀疏濾波

structure 分布的確 tlab bolt 期望有一個尋找 mean Deep Learning論文筆記之（二）Sparse Filtering稀疏濾波自己平時看了一些論文，但老感覺看完過後就會慢慢的淡忘，某一天重新拾起來的時候又好像沒有

[CVPR2015] Is object localization for free? – Weakly-supervised learning with convolutional neural networks論文筆記

sed pooling was 技術分享 sco 評測 5.0 ict highest p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px "Helvetica Neue"; color: #323333 } p.p2

WRNS：Wide Residual Networks 論文筆記

轉載請標明出處，理解不到位的地方也希望大家批評指正，謝謝！前言俗話說，高白瘦才是唯一的出路。但在深度學習界貌似並不是這樣。Wide Residual Networks就要證明自己，矮胖的神經網路也是潛力股。其實從名字中就可以看出來，Wide Re

Decoupled Networks 論文筆記

0 摘要基於內積運算的卷積操作一直是卷積神經網路（CNN）的核心元件，也是學習視覺表示的關鍵。我們觀察發現，CNN學習的特徵是類內差異（特徵的幅值）和類間差異（特徵間的夾角，語義差異）的耦合。我們提出了一種通用的解耦學習框架，該框架對類內差異和類間差

Densely Connected Convolutional Networks 論文筆記

0 摘要最近的成果顯示，如果神經網路各層到輸入和輸出層採用更短的連線，那麼網路可以設計的更深、更準確且訓練起來更有效率。本文根據這個現象，提出了Dense Convolutional Network (DenseNet)，它以前饋的方式將每個層都連線

【小白筆記】目標跟蹤(Unveiling the Power of Deep Tracking)論文筆記

1.主要貢獻這篇文章18年四月份掛在Arxiv上，現在中了ECCV18，是Martin作為3作的一篇文章，效能比ECO提升了一大截。下面就來說一下這篇文章吧，有不對的地方歡迎一起討論~ 貢獻1：該論文探究了深度特徵和手工特徵分別對目標跟蹤的影響，主要分析了

論文筆記《Very Deep Convolutional Networks for Large-Scale Image Recognition》

VGGNet在2014年的ILSVRC競賽上，獲得了top-1 error的冠軍和top-5 error的第二名，錯誤率分別為24.7%和7.3%，top-5 error的第一名是GoogLeNet 6.7%。在圖片定位任務中，也獲得了冠軍。網路層數由之前的AlexNet 的8層提高到了最高19

論文閱讀筆記四十一：Very Deep Convolutional Networks For Large-Scale Image Recongnition（VGG ICLR2015）

結合等價選擇 mac 不同的 works info 內存 enc 論文原址：https://arxiv.org/abs/1409.1556 代碼原址：https://github.com/machrisaa/tensorflow-vgg 摘要本

VGGnet論文總結（VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION）

lrn cli 共享融合 loss sca 得到同時 works VGGNet的主要貢獻：　　1、增加了網絡結構的深度　　2、使用了更小的filter（3*3） 1 introduction 這部分主要說明了，由於在所有的卷積網絡上使用了3*3的filter，所以使

論文筆記-Personal Recommendation Using Deep Recurrent Neural Networks in NetEase

use clas max onf 一位 url base 輸入 ont 思路：利用RNN對用戶瀏覽順序建模，利用FNN模擬CF，兩個網絡聯合學習 RNN網絡結構：輸出層的state表示用戶瀏覽的某一頁面，可以看做是一個one-hot表示，state0到3是依次瀏覽的

《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》論文筆記

1. 論文思想訓練深度學習網路是相當複雜的，每個層的輸入分佈會在訓練中隨著前一層的引數變化而改變。仔細地網路初始化以及較低的學習率下會降低網路的訓練速度，特別是具有飽和非線性的網路。在該論文中將該中現象稱之為“internal covariate shift”，在論文中為了解決該問

深度學習論文筆記：Deep Residual Networks with Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes

這篇文章將深度學習演算法應用於機械故障診斷，採用了“小波包分解+深度殘差網路(ResNet)”的思路，將機械振動訊號按照故障型別進行分類。文章的核心創新點：複雜旋轉機械系統的振動訊號包含著很多不同頻率的衝擊和振盪成分，而且不同頻帶內的振動成分在故障診斷中的重要程度經常是不同的，因此可以按照如下步驟設計深度

Training Very Deep Networks論文筆記

Training Very Deep Networks論文筆記

Highway Networks (Training Very Deep Networks, 2015 NIPS)

[CVPR 2016] Weakly Supervised Deep Detection Networks論文筆記

Deep Learning讀書筆記（三）：Greedy Layer-Wise Training of Deep Networks

Deep Learning論文筆記之（二）Sparse Filtering稀疏濾波

[CVPR2015] Is object localization for free? – Weakly-supervised learning with convolutional neural networks論文筆記

WRNS：Wide Residual Networks 論文筆記

Decoupled Networks 論文筆記

Densely Connected Convolutional Networks 論文筆記

【小白筆記】目標跟蹤(Unveiling the Power of Deep Tracking)論文筆記

論文筆記《Very Deep Convolutional Networks for Large-Scale Image Recognition》

論文閱讀筆記四十一：Very Deep Convolutional Networks For Large-Scale Image Recongnition（VGG ICLR2015）

VGGnet論文總結（VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION）

論文筆記-Personal Recommendation Using Deep Recurrent Neural Networks in NetEase

《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》論文筆記

深度學習論文筆記：Deep Residual Networks with Dynamically Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes

論文筆記 / Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks

VGG學習筆記-Very Deep Convolutional Networks for Large-Scale Image Recognition

【深度學習論文筆記】Deep Neural Networks for Object Detection

論文筆記：Deep neural networks for YouTube recommendations

Training Very Deep Networks論文筆記

相關推薦