機器學習筆記（十七）：TensorFlow實戰九（經典卷積神經網路：ResNet）

阿新 • • 發佈：2018-11-29

1 - 引言

我們可以看到CNN經典模型的發展從
LeNet -5、AlexNet、VGG、再到Inception，模型的層數和複雜程度都有著明顯的提高，有些網路層數更是達到100多層。但是當神經網路的層數過高時，這些神經網路會變得更加難以訓練。

一個特別大的麻煩就在於訓練的時候會產生梯度消失，非常深的網路通常會有一個梯度訊號，該訊號會迅速的消退，從而使得梯度下降變得非常緩慢。更具體的說，在梯度下降的過程中，當你從最後一層回到第一層的時候，你在每個步驟上乘以權重矩陣，因此梯度值可以迅速的指數式地減少到0（在極少數的情況下會迅速增長，造成梯度爆炸）。

在訓練的過程中，你可能會看到開始幾層的梯度的大小（或範數）迅速下降到0，如下圖：
在這裡插入圖片描述

甚至訓練誤差比那些層數少的模型更大。
在這裡插入圖片描述
因此，所以為了解決這個問題，ResNet脫穎而出。

ResNet（Residual Neural Network）由微軟研究院的Kaiming He等四名華人提出，通過使用ResNet Unit成功訓練出了152層的神經網路，並在ILSVRC2015比賽中取得冠軍，在top5上的錯誤率為3.57%，同時引數量比VGGNet低，效果非常突出。ResNet的結構可以極快的加速神經網路的訓練，模型的準確率也有比較大的提升。同時ResNet的推廣性非常好，甚至可以直接用到InceptionNet網路中。

在這裡插入圖片描述

下面我們就來詳細的介紹一下ResNet網路

2 - ResNet網路結構

在殘差網路中，一個“捷徑（shortcut）”或者說“跳躍連線（skip connection）”允許梯度直接反向傳播到更淺的層，如下圖：
在這裡插入圖片描述
影象左邊是神經網路的主路，影象右邊是添加了一條捷徑的主路，通過這些殘差塊堆疊在一起，可以形成一個非常深的網路。
殘差網路的結構非常簡單，就是不斷的通過一組一組的殘差組連結，這是一個Resnet50的結構圖，不同的網路結構在不同的組之間會有不同數目的殘差模組，如下圖：

在這裡插入圖片描述

2.1 - 恆等塊

恆等塊是殘差網路使用的的標準塊，對應於輸入的啟用值（比如 $a$

[ l ] a^{[l]}

a^{[l]}

）與輸出啟用值（比如

a^{[l+1]}

）具有相同的維度。為了具象化殘差塊的不同步驟，我們來看看下面的圖吧~

在這裡插入圖片描述

左側為正常了兩個卷積層，而右側在兩個卷積層前後做了直連，這個直連解釋殘差，左側的輸出為H(x)=F(x)，而加入直連後的H(x)=F(x)+x，一個很簡單的改進，但是取得了非常優異的效果。
至於為什麼直連要跨越兩個卷積層，而不是一個？這個是實驗驗證的結果，在一個卷積層上加直連效能並沒有太大提升。

恆等快程式碼實現：

def _building_block_v1(inputs, filters, training, projection_shortcut, strides,
                       data_format):
  """A single block for ResNet v1, without a bottleneck.
  Convolution then batch normalization then ReLU as described by:
    Deep Residual Learning for Image Recognition
    https://arxiv.org/pdf/1512.03385.pdf
    by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
  Args:
    inputs: A tensor of size [batch, channels, height_in, width_in] or
      [batch, height_in, width_in, channels] depending on data_format.
    filters: The number of filters for the convolutions.
    training: A Boolean for whether the model is in training or inference
      mode. Needed for batch normalization.
    projection_shortcut: The function to use for projection shortcuts
      (typically a 1x1 convolution when downsampling the input).
    strides: The block's stride. If greater than 1, this block will ultimately
      downsample the input.
    data_format: The input format ('channels_last' or 'channels_first').
  Returns:
    The output tensor of the block; shape should match inputs.
  """
  shortcut = inputs

  if projection_shortcut is not None:
    shortcut = projection_shortcut(inputs)
    shortcut = batch_norm(inputs=shortcut, training=training,
                          data_format=data_format)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=3, strides=strides,
      data_format=data_format)
  inputs = batch_norm(inputs, training, data_format)
  inputs = tf.nn.relu(inputs)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=3, strides=1,
      data_format=data_format)
  inputs = batch_norm(inputs, training, data_format)
  inputs += shortcut
  inputs = tf.nn.relu(inputs)

  return inputs

2.1 - 卷積塊

現在，殘差網路的卷積塊是另一種型別的殘差塊，它適用於輸入輸出的維度不一致的情況，它不同於上面的恆等塊，與之區別在於，捷徑中有一個CONV2D層，如下圖：
在這裡插入圖片描述

在這裡插入圖片描述

上面這樣圖能夠說明二者的區別，左側的通道數是64（它常出現在50層內的殘差結構中），右側的通道數是256（常出現在50層以上的殘差結構中），從右面的圖可以看到，bottleneck殘差模組將兩個33換成了11，33，11的形式，第一個11用來降通道，33用來在降通道的特徵上卷積，第二個11用於升通道。而引數的減少就是因為在第一個11將通道數降了下來。我們可以舉一個例子驗證一下：

假設樸素殘差模組與bottleneck殘差模組通道數都是256，那麼：

樸素殘差模組的引數個數：
$3*3*256*256+3*3*256*256 = 10616832$
bottleneck殘差模組的引數個數：
$1*1*256*64+3*3*64*64+1*1*64*256 = 69632$

def _bottleneck_block_v1(inputs, filters, training, projection_shortcut,
                         strides, data_format):
  """A single block for ResNet v1, with a bottleneck.
  Similar to _building_block_v1(), except using the "bottleneck" blocks
  described in:
    Convolution then batch normalization then ReLU as described by:
      Deep Residual Learning for Image Recognition
      https://arxiv.org/pdf/1512.03385.pdf
      by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
  Args:
    inputs: A tensor of size [batch, channels, height_in, width_in] or
      [batch, height_in, width_in, channels] depending on data_format.
    filters: The number of filters for the convolutions.
    training: A Boolean for whether the model is in training or inference
      mode. Needed for batch normalization.
    projection_shortcut: The function to use for projection shortcuts
      (typically a 1x1 convolution when downsampling the input).
    strides: The block's stride. If greater than 1, this block will ultimately
      downsample the input.
    data_format: The input format ('channels_last' or 'channels_first').
  Returns:
    The output tensor of the block; shape should match inputs.
  """
  shortcut = inputs

  if projection_shortcut is not None:
    shortcut = projection_shortcut(inputs)
    shortcut = batch_norm(inputs=shortcut, training=training,
                          data_format=data_format)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=1, strides=1,
      data_format=data_format)
  inputs = batch_norm(inputs, training, data_format)
  inputs = tf.nn.relu(inputs)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=filters, kernel_size=3, strides=strides,
      data_format=data_format)
  inputs = batch_norm(inputs, training, data_format)
  inputs = tf.nn.relu(inputs)

  inputs = conv2d_fixed_padding(
      inputs=inputs, filters=4 * filters, kernel_size=1, strides=1,
      data_format=data_format)
  inputs = batch_norm(inputs, training, data_format)
  inputs += shortcut
  inputs = tf.nn.relu(inputs)

  return inputs

3 - 完整的ResNet（TensorFlow版）

機器學習筆記（十七）：TensorFlow實戰九（經典卷積神經網路：ResNet）

1 - 引言

2 - ResNet網路結構

2.1 - 恆等塊

2.1 - 卷積塊

3 - 完整的ResNet（TensorFlow版）

git hub地址：https://github.com/tensorflow/models/tree/master/official/resnet

機器學習筆記（十五）：TensorFlow實戰七（經典卷積神經網路：VGG）

機器學習筆記（十七）：TensorFlow實戰九（經典卷積神經網路：ResNet）

機器學習筆記（十四）：TensorFlow實戰六（經典卷積神經網路：AlexNet ）

機器學習筆記（十六）：TensorFlow實戰八（經典卷積神經網路：GoogLeNet）

機器學習筆記（十三）：TensorFlow實戰五（經典卷積神經網路： LeNet -5 ）

深度學習（十九）基於空間金字塔池化的卷積神經網路物體檢測

TensorFlow實戰：Chapter-4（CNN-2-經典卷積神經網路（AlexNet、VGGNet））

TensorFlow實戰：經典卷積神經網路（AlexNet、VGGNet）

TensorFlow實戰：Chapter-6（CNN-4-經典卷積神經網路（ResNet）)

TensorFlow實戰：Chapter-5（CNN-3-經典卷積神經網路（GoogleNet）)

Tensorflow實戰（五）經典卷積神經網路之實現VGGNet

TensorFlow的layer層搭建卷積神經網路（CNN），實現手寫體數字識別

[深度學習]卷積神經網路：卷積、池化、常見分類網路

【基於tensorflow的學習】經典卷積神經網路、模型的儲存和讀取

deep learning 吳恩達第四課第四周卷積神經網路：Face Recognition for the Happy House - v3

經典卷積神經網路總結：Inception v1\v2\v3\v4、ResNet、ResNext、DenseNet、SENet等

卷積神經網路：常見的啟用函式

cs231n-(7)卷積神經網路：架構，卷積層/池化層

卷積神經網路：Convolutional Neural Networks(CNN)

卷積神經網路：Dropout篇

機器學習筆記（十七）：TensorFlow實戰九（經典卷積神經網路：ResNet）

1 - 引言

2 - ResNet網路結構

2.1 - 恆等塊

2.1 - 卷積塊

3 - 完整的ResNet（TensorFlow版）

git hub地址：https://github.com/tensorflow/models/tree/master/official/resnet

相關推薦