1. 程式人生 > >【TensorFlow實戰】TensorFlow實現經典卷積神經網絡之ResNet

【TensorFlow實戰】TensorFlow實現經典卷積神經網絡之ResNet

man bject dep lte 也會 weight params detail 三層

ResNet

  ResNet(Residual Neural Network)通過使用Residual Unit成功訓練152層深的神經網絡,在ILSVRC 2015比賽中獲得冠軍,取得3.57%的top-5錯誤率,同時參數量卻比VGGNet低,效果突出。ResNet的結構可以極快地加速超深神經網絡的訓練,模型的準確率也有非常大的提升。ResNet是一個推廣性非常好的網絡結構,可以直接應用到Inception Net中。

  在ResNet之前,Schmidhuber教授提出了Highway Network,原理與ResNet很相似。通常認為網絡的深度對其性能非常重要,但是網絡越深其訓練難度越大,Highway Network的目標就是解決極深的神經網絡難以訓練的問題。Highway Network 相當於修改了每一層的激活函數,此前的激活函數只是對輸入做一個非線性變換技術分享圖片

,Highway Network則允許保留一定比例的原始輸入x,即技術分享圖片,其中T為變換系數,C為保留系數,論文中令C=1-T。這樣前一層的信息,有一定比例可以不經過矩陣乘法和非線性變換,直接傳輸到下一層,仿佛一條信息高速公路,因此得名Highway Network。其主要通過gating units學習如何控制網絡中的信息流,即學習原始信息應保留的比例。Highway Network的設計在理論上允許其訓練任意深度的網絡,其優化方法基本上與網絡的深度獨立,而傳統的神經網絡結構則對深度非常敏感,訓練復雜度隨深度增加而急劇增加。

  ResNet也是允許輸入信息直接傳輸到後面的層中。ResNet最初的靈感出自這個問題:在不斷加神經網絡的深度時,會出現一個Degradation的問題,即準確率會先上升然後達到飽和,再持續增加深度則會導致準確率下降。假設有一個比較淺的網絡達到了飽和的準確率,那麽後面再加上幾個y=x的全等映射層起碼誤差不會增加,即更深的網絡不應該帶來訓練集上誤差上升。而這裏提到的使用全等映射直接將前一層輸出傳到後面的思想,就是ResNet的靈感來源。假定某段神經網絡的輸入是x,期望輸出是H(x),如果我們直接把輸入x傳到輸出作為初試結果,那麽此時我們需要學習的目標就是F(x)=H(x)-x。ResNet相當於將學習目標改變了,不再是學習一個完整的輸出H(x),只是輸出和輸入的差別H(x)-x,即殘差。下圖就是一個ResNet的殘差學習單元(Residual Unit):

技術分享圖片

  普通直連的卷積神經網絡和ResNet的最大區別在於,ResNet有很多旁路的支線將輸入直接連到後面的層,使得後面的層可以直接學習殘差,這種結構也被稱為shortcut或skip connections。下圖展示VGGNet-19以及一個34層深的普通卷積網絡,和34層深的ResNet網絡的對比圖:

技術分享圖片

  傳統的卷積層或全連接層在信息傳遞時,或多或少會存在信息丟失、耗損等問題。ResNet在某種程度上解決了這個問題,通過直接將輸入信息繞道傳出輸出,保護信息的完整性,整個網絡則只需要學習輸入、輸出差別的那一部分,簡化學習目標和難度。

  在ResNet的論文中,提出了兩層殘差學習單元和三層殘差學習單元。兩層的殘差學習單元包含兩個相同輸出通道數(因為殘差等於目標輸出減去輸入,即H(x)-x,因此輸入、輸出維度需保持一致)的3*3卷積;而3層的殘差網絡則使用了Network In NetWork和Inception Net中的1*1卷積,並且是在中間3*3卷積前後都使用了1*1卷積,有先降維再升維的操作。另外,如果有輸入、輸出維度不同的情況,我們可以對x做一個線性映射變換維度,再連接到後面的層。如下圖所示:

技術分享圖片

  下圖為ResNet在不同層的網絡配置,其中基礎結構很類似,都是前面提到的兩層和三層殘差單元的堆疊。

技術分享圖片

  在使用了ResNet的結構後,可以發現層數不斷加深導致的訓練集上誤差增大的現象被消除了,ResNet網絡的訓練誤差會隨著層數增大而逐漸減小,並且測試集上的表現也會變好。

  ResNet V2和ResNet V1的區別在於,作者通過研究ResNet殘差學習單元的傳播公式,發現前饋和反饋信號可以直接傳輸,因此skip connection的非線性激活函數(如ReLU)替換為Identity Mappings(y=x)。同時,ResNet v2 在每一層都使用了Batch Normalization。這樣處理後,新的殘差學習單元將比以前更容易訓練且泛化性更強。

下面用TensorFlow實現ResNet V2:

"""

Typical use:

   from tensorflow.contrib.slim.nets import resnet_v2

ResNet-101 for image classification into 1000 classes:

   # inputs has shape [batch, 224, 224, 3]
   with slim.arg_scope(resnet_v2.resnet_arg_scope(is_training)):
      net, end_points = resnet_v2.resnet_v2_101(inputs, 1000)

ResNet-101 for semantic segmentation into 21 classes:

   # inputs has shape [batch, 513, 513, 3]
   with slim.arg_scope(resnet_v2.resnet_arg_scope(is_training)):
      net, end_points = resnet_v2.resnet_v2_101(inputs,
                                                21,
                                                global_pool=False,
                                                output_stride=16)
"""
import collections
import tensorflow as tf
slim = tf.contrib.slim



class Block(collections.namedtuple(Block, [scope, unit_fn, args])):
  """A named tuple describing a ResNet block.

  Its parts are:
    scope: The scope of the `Block`.
    unit_fn: The ResNet unit function which takes as input a `Tensor` and
      returns another `Tensor` with the output of the ResNet unit.
    args: A list of length equal to the number of units in the `Block`. The list
      contains one (depth, depth_bottleneck, stride) tuple for each unit in the
      block to serve as argument to unit_fn.
  """


def subsample(inputs, factor, scope=None):    
# 降采樣
# inputs 輸入;factor 采樣因子
  """Subsamples the input along the spatial dimensions.

  Args:
    inputs: A `Tensor` of size [batch, height_in, width_in, channels].
    factor: The subsampling factor.
    scope: Optional variable_scope.

  Returns:
    output: A `Tensor` of size [batch, height_out, width_out, channels] with the
      input, either intact (if factor == 1) or subsampled (if factor > 1).
  """
  if factor == 1:
    return inputs
  else:
    return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)


def conv2d_same(inputs, num_outputs, kernel_size, stride, scope=None):
# 卷積層
  """Strided 2-D convolution with ‘SAME‘ padding.

  When stride > 1, then we do explicit zero-padding, followed by conv2d with
  ‘VALID‘ padding.

  Note that

     net = conv2d_same(inputs, num_outputs, 3, stride=stride)

  is equivalent to

     net = slim.conv2d(inputs, num_outputs, 3, stride=1, padding=‘SAME‘)
     net = subsample(net, factor=stride)

  whereas

     net = slim.conv2d(inputs, num_outputs, 3, stride=stride, padding=‘SAME‘)

  is different when the input‘s height or width is even, which is why we add the
  current function. For more details, see ResnetUtilsTest.testConv2DSameEven().

  Args:
    inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
    num_outputs: An integer, the number of output filters.
    kernel_size: An int with the kernel_size of the filters.
    stride: An integer, the output stride.
    rate: An integer, rate for atrous convolution.
    scope: Scope.

  Returns:
    output: A 4-D tensor of size [batch, height_out, width_out, channels] with
      the convolution output.
  """
  if stride == 1:
    return slim.conv2d(inputs, num_outputs, kernel_size, stride=1,
                       padding=SAME, scope=scope)
  else:
    #kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
    pad_total = kernel_size - 1
    pad_beg = pad_total // 2
    pad_end = pad_total - pad_beg
    inputs = tf.pad(inputs,
                    [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
    return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
                       padding=VALID, scope=scope)


@slim.add_arg_scope
def stack_blocks_dense(net, blocks,
                       outputs_collections=None):
# 堆疊Blocks的函數;使用兩層循環,逐個Residual Unit堆疊
  """Stacks ResNet `Blocks` and controls output feature density.

  First, this function creates scopes for the ResNet in the form of
  ‘block_name/unit_1‘, ‘block_name/unit_2‘, etc.


  Args:
    net: A `Tensor` of size [batch, height, width, channels].
    blocks: A list of length equal to the number of ResNet `Blocks`. Each
      element is a ResNet `Block` object describing the units in the `Block`.
    outputs_collections: Collection to add the ResNet block outputs.

  Returns:
    net: Output tensor 

  """
  for block in blocks:
    with tf.variable_scope(block.scope, block, [net]) as sc:
      for i, unit in enumerate(block.args):

        with tf.variable_scope(unit_%d % (i + 1), values=[net]):
          unit_depth, unit_depth_bottleneck, unit_stride = unit
          net = block.unit_fn(net,
                              depth=unit_depth,
                              depth_bottleneck=unit_depth_bottleneck,
                              stride=unit_stride)
      net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
      
  return net


def resnet_arg_scope(is_training=True,
                     weight_decay=0.0001,
                     batch_norm_decay=0.997,
                     batch_norm_epsilon=1e-5,
                     batch_norm_scale=True):
# 用來定義某些函數的參數默認值
  """Defines the default ResNet arg scope.

  TODO(gpapan): The batch-normalization related default values above are
    appropriate for use in conjunction with the reference ResNet models
    released at https://github.com/KaimingHe/deep-residual-networks. When
    training ResNets from scratch, they might need to be tuned.

  Args:
    is_training: Whether or not we are training the parameters in the batch
      normalization layers of the model.
    weight_decay: The weight decay to use for regularizing the model.
    batch_norm_decay: The moving average decay when estimating layer activation
      statistics in batch normalization.
    batch_norm_epsilon: Small constant to prevent division by zero when
      normalizing activations by their variance in batch normalization.
    batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
      activations in the batch normalization layer.

  Returns:
    An `arg_scope` to use for the resnet models.
  """
  batch_norm_params = {
      is_training: is_training,
      decay: batch_norm_decay,
      epsilon: batch_norm_epsilon,
      scale: batch_norm_scale,
      updates_collections: tf.GraphKeys.UPDATE_OPS,
  }

  with slim.arg_scope(
      [slim.conv2d],
      weights_regularizer=slim.l2_regularizer(weight_decay),
      weights_initializer=slim.variance_scaling_initializer(),
      activation_fn=tf.nn.relu,
      normalizer_fn=slim.batch_norm,
      normalizer_params=batch_norm_params):
    with slim.arg_scope([slim.batch_norm], **batch_norm_params):
      # The following implies padding=‘SAME‘ for pool1, which makes feature
      # alignment easier for dense prediction tasks. This is also used in
      # https://github.com/facebook/fb.resnet.torch. However the accompanying
      # code of ‘Deep Residual Learning for Image Recognition‘ uses
      # padding=‘VALID‘ for pool1. You can switch to that choice by setting
      # slim.arg_scope([slim.max_pool2d], padding=‘VALID‘).
      with slim.arg_scope([slim.max_pool2d], padding=SAME) as arg_sc:
        return arg_sc




@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride,
               outputs_collections=None, scope=None):
# 核心的bottleneck殘差學習單元
  """Bottleneck residual unit variant with BN before convolutions.

  This is the full preactivation residual unit variant proposed in [2]. See
  Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck
  variant which has an extra bottleneck layer.

  When putting together two consecutive ResNet blocks that use this unit, one
  should use stride = 2 in the last unit of the first block.

  Args:
    inputs: A tensor of size [batch, height, width, channels].
    depth: The depth of the ResNet unit output.
    depth_bottleneck: The depth of the bottleneck layers.
    stride: The ResNet unit‘s stride. Determines the amount of downsampling of
      the units output compared to its input.
    rate: An integer, rate for atrous convolution.
    outputs_collections: Collection to add the ResNet unit output.
    scope: Optional variable_scope.

  Returns:
    The ResNet unit‘s output.
  """
  with tf.variable_scope(scope, bottleneck_v2, [inputs]) as sc:
    depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
    preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope=preact)
    if depth == depth_in:
      shortcut = subsample(inputs, stride, shortcut)
    else:
      shortcut = slim.conv2d(preact, depth, [1, 1], stride=stride,
                             normalizer_fn=None, activation_fn=None,
                             scope=shortcut)

    residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1,
                           scope=conv1)
    residual = conv2d_same(residual, depth_bottleneck, 3, stride,
                                        scope=conv2)
    residual = slim.conv2d(residual, depth, [1, 1], stride=1,
                           normalizer_fn=None, activation_fn=None,
                           scope=conv3)

    output = shortcut + residual

    return slim.utils.collect_named_outputs(outputs_collections,
                                            sc.name,
                                            output)


def resnet_v2(inputs,
              blocks,
              num_classes=None,
              global_pool=True,
              include_root_block=True,
              reuse=None,
              scope=None):
# 生成ResNetV2的主函數
  """Generator for v2 (preactivation) ResNet models.

  This function generates a family of ResNet v2 models. See the resnet_v2_*()
  methods for specific model instantiations, obtained by selecting different
  block instantiations that produce ResNets of various depths.


  Args:
    inputs: A tensor of size [batch, height_in, width_in, channels].
    blocks: A list of length equal to the number of ResNet blocks. Each element
      is a resnet_utils.Block object describing the units in the block.
    num_classes: Number of predicted classes for classification tasks. If None
      we return the features before the logit layer.
    include_root_block: If True, include the initial convolution followed by
      max-pooling, if False excludes it. If excluded, `inputs` should be the
      results of an activation-less convolution.
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse ‘scope‘ must be given.
    scope: Optional variable_scope.


  Returns:
    net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
      If global_pool is False, then height_out and width_out are reduced by a
      factor of output_stride compared to the respective height_in and width_in,
      else both height_out and width_out equal one. If num_classes is None, then
      net is the output of the last ResNet block, potentially after global
      average pooling. If num_classes is not None, net contains the pre-softmax
      activations.
    end_points: A dictionary from components of the network to the corresponding
      activation.

  Raises:
    ValueError: If the target output_stride is not valid.
  """
  with tf.variable_scope(scope, resnet_v2, [inputs], reuse=reuse) as sc:
    end_points_collection = sc.original_name_scope + _end_points
    with slim.arg_scope([slim.conv2d, bottleneck,
                         stack_blocks_dense],
                        outputs_collections=end_points_collection):
      net = inputs
      if include_root_block:
        # We do not include batch normalization or activation functions in conv1
        # because the first ResNet unit will perform these. Cf. Appendix of [2].
        with slim.arg_scope([slim.conv2d],
                            activation_fn=None, normalizer_fn=None):
          net = conv2d_same(net, 64, 7, stride=2, scope=conv1)
        net = slim.max_pool2d(net, [3, 3], stride=2, scope=pool1)
      net = stack_blocks_dense(net, blocks)
      # This is needed because the pre-activation variant does not have batch
      # normalization or activation functions in the residual unit output. See
      # Appendix of [2].
      net = slim.batch_norm(net, activation_fn=tf.nn.relu, scope=postnorm)
      if global_pool:
        # Global average pooling.
        net = tf.reduce_mean(net, [1, 2], name=pool5, keep_dims=True)
      if num_classes is not None:
        net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
                          normalizer_fn=None, scope=logits)
      # Convert end_points_collection into a dictionary of end_points.
      end_points = slim.utils.convert_collection_to_dict(end_points_collection)
      if num_classes is not None:
        end_points[predictions] = slim.softmax(net, scope=predictions)
      return net, end_points



def resnet_v2_50(inputs,
                 num_classes=None,
                 global_pool=True,
                 reuse=None,
                 scope=resnet_v2_50):
  """ResNet-50 model of [1]. See resnet_v2() for arg and return description."""
  blocks = [
      Block(block1, bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
      Block(
          block2, bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
      Block(
          block3, bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
      Block(
          block4, bottleneck, [(2048, 512, 1)] * 3)]
  return resnet_v2(inputs, blocks, num_classes, global_pool,
                   include_root_block=True, reuse=reuse, scope=scope)


def resnet_v2_101(inputs,
                  num_classes=None,
                  global_pool=True,
                  reuse=None,
                  scope=resnet_v2_101):
  """ResNet-101 model of [1]. See resnet_v2() for arg and return description."""
  blocks = [
      Block(
          block1, bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
      Block(
          block2, bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
      Block(
          block3, bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
      Block(
          block4, bottleneck, [(2048, 512, 1)] * 3)]
  return resnet_v2(inputs, blocks, num_classes, global_pool,
                   include_root_block=True, reuse=reuse, scope=scope)


def resnet_v2_152(inputs,
                  num_classes=None,
                  global_pool=True,
                  reuse=None,
                  scope=resnet_v2_152):
  """ResNet-152 model of [1]. See resnet_v2() for arg and return description."""
  blocks = [
      Block(
          block1, bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
      Block(
          block2, bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
      Block(
          block3, bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
      Block(
          block4, bottleneck, [(2048, 512, 1)] * 3)]
  return resnet_v2(inputs, blocks, num_classes, global_pool,
                   include_root_block=True, reuse=reuse, scope=scope)


def resnet_v2_200(inputs,
                  num_classes=None,
                  global_pool=True,
                  reuse=None,
                  scope=resnet_v2_200):
  """ResNet-200 model of [2]. See resnet_v2() for arg and return description."""
  blocks = [
      Block(
          block1, bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
      Block(
          block2, bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
      Block(
          block3, bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
      Block(
          block4, bottleneck, [(2048, 512, 1)] * 3)]
  return resnet_v2(inputs, blocks, num_classes, global_pool,
                   include_root_block=True, reuse=reuse, scope=scope)

  
from datetime import datetime
import math
import time
def time_tensorflow_run(session, target, info_string):
# 評測函數
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0
    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i >= num_steps_burn_in:
            if not i % 10:
                print (%s: step %d, duration = %.3f %
                       (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration
    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print (%s: %s across %d steps, %.3f +/- %.3f sec / batch %
           (datetime.now(), info_string, num_batches, mn, sd))
    
batch_size = 32
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(resnet_arg_scope(is_training=False)):
   net, end_points = resnet_v2_152(inputs, 1000)
  
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)  
num_batches=100
time_tensorflow_run(sess, net, "Forward") 

  ResNet可以算是深度學習中一個裏程碑式的突破,真正意義上支持極深神經網絡的訓練。

【TensorFlow實戰】TensorFlow實現經典卷積神經網絡之ResNet