1. 程式人生 > >機器學習筆記(十六):TensorFlow實戰八(經典卷積神經網路:GoogLeNet)

機器學習筆記(十六):TensorFlow實戰八(經典卷積神經網路:GoogLeNet)

1 - 引言

GoogLeNet, 在2014年ILSVRC挑戰賽獲得冠軍,將Top5 的錯誤率降低到6.67%. 一個22層的深度網路

論文地址:http://arxiv.org/pdf/1409.4842v1.pdf

題目為:Going deeper with convolutions。

GoogLeNet這個名字也是挺有意思的,為了像開山鼻祖的LeNet網路致敬,他們選擇了這樣的名字。

之前我們介紹的網路模型都是通過卷積層和池化層的“串聯”來構建的。

而GoogLeNet結構創新之處在於:

GoogLeNet 模型中的Inception結構是將不同的卷積層通過“並聯”的方式結合在一起。

Inception結構如下圖所示:
在這裡插入圖片描述

在GoogLeNet出來之前,主流的網路結構突破大致是網路更深(即層數增加),網路更寬(即神經元數目增多),但這樣做存在一些缺點:

  • 當訓練集有限時,引數過多,模型會出現過擬合;
  • 網路越大,計算複雜度越大,設計起來越困難;
  • 當層數增多時,梯度越往後越容易消失;

針對上述缺點,我們考慮到一味的追求準確率而增加網路規模有一部分原因就是特徵提取模組的設計沒有能很好提取出影象的特徵,如果能在基本的特徵提取單元上做一些優化,然後用優化後的特徵提取模組去構建網路,可能會有利於最後的識別效果。由此,Inception 模型孕育而生。

下面我們著重介紹一下GoogLeNet中的Inception -v3模型

2 - Inception -v3模型

在這裡插入圖片描述
在這裡插入圖片描述

Inception-v3模型總共有46層,有11個Inception模組組成。而且有96個卷積層,如果用之前的TensorFlow程式來編寫,一個卷積層需要5行程式碼,那麼總共就需要480行程式碼來實現所有的卷積層。這樣使得程式碼的可讀性非常差。於是,為了實現這種複雜的卷積神經網路。

TensorFlow又推出了一款叫TensorFlow-Slim工具來更加簡潔地實現一個卷積網路

3 - TensorFlow-Slim工具

slim是一個使構建,訓練,評估神經網路變得簡單的庫。它可以消除原生tensorflow裡面很多重複的模板性的程式碼,讓程式碼更緊湊,更具備可讀性。另外slim提供了很多計算機視覺方面的著名模型(VGG, AlexNet等),我們不僅可以直接使用,甚至能以各種方式進行擴充套件。

4 - Inception-v3卷積層實現

因為Inception-v3是一個比較複雜的神經網路結構在這裡只給出卷積部分的實現程式碼:

########定義函式可以生成Inception V3網路的卷積部分########
def inception_v3_base(inputs, scope=None):
  '''
  Args:
  inputs:輸入的tensor
  scope:包含了函式預設引數的環境
  '''
  end_points = {} # 定義一個字典表儲存某些關鍵節點供之後使用

  with tf.variable_scope(scope, 'InceptionV3', [inputs]):
    with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], # 對三個引數設定預設值
                        stride=1, padding='VALID'):
      # 正式定義Inception V3的網路結構。首先是前面的非Inception Module的卷積層
      # 299 x 299 x 3
      # 第一個引數為輸入的tensor,第二個是輸出的通道數,卷積核尺寸,步長stride,padding模式
      net = slim.conv2d(inputs, 32, [3, 3], stride=2, scope='Conv2d_1a_3x3') # 直接使用slim.conv2d建立卷積層
      # 149 x 149 x 32
      '''
      因為使用了slim以及slim.arg_scope,我們一行程式碼就可以定義好一個卷積層
      相比AlexNet使用好幾行程式碼定義一個卷積層,或是VGGNet中專門寫一個函式定義卷積層,都更加方便
      '''
      net = slim.conv2d(net, 32, [3, 3], scope='Conv2d_2a_3x3')
      # 147 x 147 x 32
      net = slim.conv2d(net, 64, [3, 3], padding='SAME', scope='Conv2d_2b_3x3')
      # 147 x 147 x 64
      net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_3a_3x3')
      # 73 x 73 x 64
      net = slim.conv2d(net, 80, [1, 1], scope='Conv2d_3b_1x1')
      # 73 x 73 x 80.
      net = slim.conv2d(net, 192, [3, 3], scope='Conv2d_4a_3x3')
      # 71 x 71 x 192.
      net = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_5a_3x3')
      # 35 x 35 x 192.

      # 上面部分程式碼一共有5個卷積層,2個池化層,實現了對圖片資料的尺寸壓縮,並對圖片特徵進行了抽象

    '''
    三個連續的Inception模組組,三個Inception模組組中各自分別有多個Inception Module,這部分是Inception Module V3
    的精華所在。每個Inception模組組內部的幾個Inception Mdoule結構非常相似,但是存在一些細節的不同
    '''
    # Inception blocks
    with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], # 設定所有模組組的預設引數
                        stride=1, padding='SAME'): # 將所有卷積層、最大池化、平均池化層步長都設定為1
      # mixed: 35 x 35 x 256.
      # 第一個模組組包含了三個結構類似的Inception Module
      with tf.variable_scope('Mixed_5b'): # 第一個Inception Module名稱。Inception Module有四個分支
        with tf.variable_scope('Branch_0'): # 第一個分支64通道的1*1卷積
          branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'): # 第二個分支48通道1*1卷積,連結一個64通道的5*5卷積
          branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
          branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
        with tf.variable_scope('Branch_3'): # 第四個分支為3*3的平均池化,連線32通道的1*1卷積
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, 32, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) # 將四個分支的輸出合併在一起(第三個維度合併,即輸出通道上合併)

      '''
      因為這裡所有層步長均為1,並且padding模式為SAME,所以圖片尺寸不會縮小,但是通道數增加了。四個分支通道數之和
      64+64+96+32=256,最終輸出的tensor的圖片尺寸為35*35*256。
      第一個模組組所有Inception Module輸出圖片尺寸都是35*35,但是後兩個輸出通道數會發生變化。
      '''

      # mixed_1: 35 x 35 x 288.
      with tf.variable_scope('Mixed_5c'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0b_1x1')
          branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv_1_0c_5x5')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
          branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)

      # mixed_2: 35 x 35 x 288.
      with tf.variable_scope('Mixed_5d'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, 64, [5, 5], scope='Conv2d_0b_5x5')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
          branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)

      # 第二個Inception模組組。第二個到第五個Inception Module結構相似。
      # mixed_3: 17 x 17 x 768.
      with tf.variable_scope('Mixed_6a'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 384, [3, 3], stride=2,
                                 padding='VALID', scope='Conv2d_1a_1x1') # 圖片會被壓縮
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_0b_3x3')
          branch_1 = slim.conv2d(branch_1, 96, [3, 3], stride=2,
                                 padding='VALID', scope='Conv2d_1a_1x1') # 圖片被壓縮
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
                                     scope='MaxPool_1a_3x3')
        net = tf.concat([branch_0, branch_1, branch_2], 3) # 輸出尺寸定格在17 x 17 x 768

      # mixed4: 17 x 17 x 768.
      with tf.variable_scope('Mixed_6b'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, 128, [1, 7], scope='Conv2d_0b_1x7') # 串聯1*7卷積和7*1卷積合成7*7卷積,減少了引數,減輕了過擬合
          branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
        with tf.variable_scope('Branch_2'): 
          branch_2 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1') # 反覆將7*7卷積拆分
          branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_0b_7x1') 
          branch_2 = slim.conv2d(branch_2, 128, [1, 7], scope='Conv2d_0c_1x7')
          branch_2 = slim.conv2d(branch_2, 128, [7, 1], scope='Conv2d_0d_7x1')
          branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)

      # mixed_5: 17 x 17 x 768.
      with tf.variable_scope('Mixed_6c'):
        with tf.variable_scope('Branch_0'):
          '''
          我們的網路每經過一個inception module,即使輸出尺寸不變,但是特徵都相當於被重新精煉了一遍,
          其中豐富的卷積和非線性化對提升網路效能幫助很大。
          '''
          branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
          branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
          branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
          branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
          branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
      # mixed_6: 17 x 17 x 768.
      with tf.variable_scope('Mixed_6d'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, 160, [1, 7], scope='Conv2d_0b_1x7')
          branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0b_7x1')
          branch_2 = slim.conv2d(branch_2, 160, [1, 7], scope='Conv2d_0c_1x7')
          branch_2 = slim.conv2d(branch_2, 160, [7, 1], scope='Conv2d_0d_7x1')
          branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)

      # mixed_7: 17 x 17 x 768.
      with tf.variable_scope('Mixed_6e'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, 192, [1, 7], scope='Conv2d_0b_1x7')
          branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(branch_2, 192, [7, 1], scope='Conv2d_0b_7x1')
          branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0c_1x7')
          branch_2 = slim.conv2d(branch_2, 192, [7, 1], scope='Conv2d_0d_7x1')
          branch_2 = slim.conv2d(branch_2, 192, [1, 7], scope='Conv2d_0e_1x7')
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
      end_points['Mixed_6e'] = net # 將Mixed_6e儲存於end_points中,作為Auxiliary Classifier輔助模型的分類

      # 第三個inception模組組包含了三個inception module
      # mixed_8: 8 x 8 x 1280.
      with tf.variable_scope('Mixed_7a'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
          branch_0 = slim.conv2d(branch_0, 320, [3, 3], stride=2,
                                 padding='VALID', scope='Conv2d_1a_3x3') # 壓縮圖片
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = slim.conv2d(branch_1, 192, [1, 7], scope='Conv2d_0b_1x7')
          branch_1 = slim.conv2d(branch_1, 192, [7, 1], scope='Conv2d_0c_7x1')
          branch_1 = slim.conv2d(branch_1, 192, [3, 3], stride=2,
                                 padding='VALID', scope='Conv2d_1a_3x3')
        with tf.variable_scope('Branch_2'): # 池化層不會對輸出通道數產生改變
          branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
                                     scope='MaxPool_1a_3x3')
        net = tf.concat([branch_0, branch_1, branch_2], 3) # 輸出圖片尺寸被縮小,通道數增加,tensor的總size在持續下降中
      # mixed_9: 8 x 8 x 2048.
      with tf.variable_scope('Mixed_7b'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = tf.concat([
              slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_0b_1x3'),
              slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_0b_3x1')], 3)
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(
              branch_2, 384, [3, 3], scope='Conv2d_0b_3x3')
          branch_2 = tf.concat([
              slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_0c_1x3'),
              slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_0d_3x1')], 3)
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(
              branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) # 輸出通道數增加到2048

      # mixed_10: 8 x 8 x 2048.
      with tf.variable_scope('Mixed_7c'):
        with tf.variable_scope('Branch_0'):
          branch_0 = slim.conv2d(net, 320, [1, 1], scope='Conv2d_0a_1x1')
        with tf.variable_scope('Branch_1'):
          branch_1 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
          branch_1 = tf.concat([
              slim.conv2d(branch_1, 384, [1, 3], scope='Conv2d_0b_1x3'),
              slim.conv2d(branch_1, 384, [3, 1], scope='Conv2d_0c_3x1')], 3)
        with tf.variable_scope('Branch_2'):
          branch_2 = slim.conv2d(net, 448, [1, 1], scope='Conv2d_0a_1x1')
          branch_2 = slim.conv2d(
              branch_2, 384, [3, 3], scope='Conv2d_0b_3x3')
          branch_2 = tf.concat([
              slim.conv2d(branch_2, 384, [1, 3], scope='Conv2d_0c_1x3'),
              slim.conv2d(branch_2, 384, [3, 1], scope='Conv2d_0d_3x1')], 3)
        with tf.variable_scope('Branch_3'):
          branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
          branch_3 = slim.conv2d(
              branch_3, 192, [1, 1], scope='Conv2d_0b_1x1')
        net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
      return net, end_points
      #Inception V3網路的核心部分,即卷積層部分就完成了
      '''
      設計inception net的重要原則是圖片尺寸不斷縮小,inception模組組的目的都是將空間結構簡化,同時將空間資訊轉化為
      高階抽象的特徵資訊,即將空間維度轉為通道的維度。降低了計算量。Inception Module是通過組合比較簡單的特徵
      抽象(分支1)、比較比較複雜的特徵抽象(分支2和分支3)和一個簡化結構的池化層(分支4),一共四種不同程度的
      特徵抽象和變換來有選擇地保留不同層次的高階特徵,這樣最大程度地豐富網路的表達能力。
      '''

完整的Inception-v3模型的實現

GitHub地址:https://github.com/tensorflow/tensorflow/blob/fc1567c78b3746b44aa50373489a767afbb95d2b/tensorflow/contrib/slim/python/slim/nets/inception_v3.py