深度學習基礎系列（十）| Global Average Pooling是否可以替代全連線層？

阿新 • • 發佈：2019-01-13

　　Global Average Pooling(簡稱GAP，全域性池化層)技術最早提出是在這篇論文（第3.2節）中，被認為是可以替代全連線層的一種新技術。在keras釋出的經典模型中，可以看到不少模型甚至拋棄了全連線層，轉而使用GAP，而在支援遷移學習方面，各個模型幾乎都支援使用Global Average Pooling和Global Max Pooling(GMP)。然而，GAP是否真的可以取代全連線層？其背後的原理何在呢？本文來一探究竟。

一、什麼是GAP？

　　先看看原論文的定義：

　　In this paper, we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer. One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps. Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.　

　　簡單來說，就是在卷積層之後，用GAP替代FC全連線層。有兩個有點：一是GAP在特徵圖與最終的分類間轉換更加簡單自然；二是不像FC層需要大量訓練調優的引數，降低了空間引數會使模型更加健壯，抗過擬合效果更佳。

　　我們再用更直觀的影象來看GAP的工作原理：

　　假設卷積層的最後輸出是h × w × d 的三維特徵圖，具體大小為6 × 6 × 3，經過GAP轉換後，變成了大小為 1 × 1 × 3 的輸出值，也就是每一層 h × w 會被平均化成一個值。

二、 GAP在Keras中的定義

　　GAP的使用一般在卷積層之後，輸出層之前：

x = layers.MaxPooling2D((2, 2), strides=(2, 2), name=' 
block5_pool')(x) #卷積層最後一層
x = layers.GlobalAveragePooling2D()(x) #GAP層
prediction = Dense(10, activation='softmax')(x) #輸出層

　　再看看GAP的程式碼具體實現：

@tf_export('keras.layers.GlobalAveragePooling2D',
           'keras.layers.GlobalAvgPool2D')
class GlobalAveragePooling2D(GlobalPooling2D):
  """Global average pooling operation for spatial data.
  Arguments:
      data_format: A string,
          one of `channels_last` (default) or `channels_first`.
          The ordering of the dimensions in the inputs.
          `channels_last` corresponds to inputs with shape
          `(batch, height, width, channels)` while `channels_first`
          corresponds to inputs with shape
          `(batch, channels, height, width)`.
          It defaults to the `image_data_format` value found in your
          Keras config file at `~/.keras/keras.json`.
          If you never set it, then it will be "channels_last".
  Input shape:
      - If `data_format='channels_last'`:
          4D tensor with shape:
          `(batch_size, rows, cols, channels)`
      - If `data_format='channels_first'`:
          4D tensor with shape:
          `(batch_size, channels, rows, cols)`
  Output shape:
      2D tensor with shape:
      `(batch_size, channels)`
   
"""

  def call(self, inputs):
    if self.data_format == 'channels_last':
      return backend.mean(inputs, axis=[1, 2])
    else:
      return backend.mean(inputs, axis=[2, 3])

　　實現很簡單，對寬度和高度兩個維度的特徵資料進行平均化求值。如果是NHWC結構（數量、寬度、高度、通道數），則axis=[1, 2]；反之如果是CNHW，則axis=[2, 3]。

三、GAP VS GMP VS FC

　　在驗證GAP技術可行性前，我們需要準備訓練和測試資料集。我在牛津大學網站上找到了17種不同花類的資料集，地址為：http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html 。該資料集每種花有80張圖片，共計1360張圖片，我對花進行了分類處理，抽取了部分資料作為測試資料，這樣最終訓練和測試資料的數量比為7:1。

　　在Keras經典模型中，若支援遷移學習，不但有GAP，還有GMP，而預設是自己組建FC層，一個典型的實現為：

 if include_top:
        # Classification block
        x = layers.Flatten(name='flatten')(x)
        x = layers.Dense(4096, activation='relu', name='fc1')(x)
        x = layers.Dense(4096, activation='relu', name='fc2')(x)
        x = layers.Dense(classes, activation='softmax', name='predictions')(x)
    else:
        if pooling == 'avg':
            x = layers.GlobalAveragePooling2D()(x)
        elif pooling == 'max':
            x = layers.GlobalMaxPooling2D()(x)

　　本文將在同一資料集條件下，比較GAP、GMP和FC層的優劣，選取測試模型為VGG19和InceptionV3兩種模型的遷移學習版本。

　　先看看在VGG19模型下，GAP、GMP和FC層在各自迭代50次後，驗證準確度和損失度的比對。程式碼如下：

import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
from keras.applications.vgg19 import VGG19from keras.layers import Dense, Flatten
from matplotlib import pyplot as plt
import numpy as np

# 為保證公平起見，使用相同的隨機種子
np.random.seed(7)
batch_size = 32
# 迭代50次
epochs = 50
# 依照模型規定，圖片大小被設定為224
IMAGE_SIZE = 224
# 17種花的分類
NUM_CLASSES = 17
TRAIN_PATH = '/home/yourname/Documents/tensorflow/images/17flowerclasses/train'
TEST_PATH = '/home/yourname/Documents/tensorflow/images/17flowerclasses/test'
FLOWER_CLASSES = ['Bluebell', 'ButterCup', 'ColtsFoot', 'Cowslip', 'Crocus', 'Daffodil', 'Daisy',
                  'Dandelion', 'Fritillary', 'Iris', 'LilyValley', 'Pansy', 'Snowdrop', 'Sunflower',
                  'Tigerlily', 'tulip', 'WindFlower']


def model(mode='fc'):
    if mode == 'fc':
        # FC層設定為含有512個引數的隱藏層
        base_model = VGG19(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='none')
        x = base_model.output
        x = Flatten()(x)
        x = Dense(512, activation='relu')(x)
        prediction = Dense(NUM_CLASSES, activation='softmax')(x)
    elif mode == 'avg':
        # GAP層通過指定pooling='avg'來設定
        base_model = VGG19(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='avg')
        x = base_model.output
        prediction = Dense(NUM_CLASSES, activation='softmax')(x)
    else:
        # GMP層通過指定pooling='max'來設定
        base_model = VGG19(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='max')
        x = base_model.output
        prediction = Dense(NUM_CLASSES, activation='softmax')(x)

    model = Model(input=base_model.input, output=prediction)
    model.summary()
    opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
    model.compile(loss='categorical_crossentropy',
                             optimizer=opt,
                             metrics=['accuracy'])

    # 使用資料增強
    train_datagen = ImageDataGenerator()
    train_generator = train_datagen.flow_from_directory(directory=TRAIN_PATH,
                                                        target_size=(IMAGE_SIZE, IMAGE_SIZE),
                                                        classes=FLOWER_CLASSES)
    test_datagen = ImageDataGenerator()
    test_generator = test_datagen.flow_from_directory(directory=TEST_PATH,
                                                      target_size=(IMAGE_SIZE, IMAGE_SIZE),
                                                      classes=FLOWER_CLASSES)
    # 執行模型
    history = model.fit_generator(train_generator, epochs=epochs, validation_data=test_generator)
    return history


fc_history = model('fc')
avg_history = model('avg')
max_history = model('max')


# 比較多種模型的精確度
plt.plot(fc_history.history['val_acc'])
plt.plot(avg_history.history['val_acc'])
plt.plot(max_history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Validation Accuracy')
plt.xlabel('Epoch')
plt.legend(['FC', 'AVG', 'MAX'], loc='lower right')
plt.grid(True)
plt.show()

# 比較多種模型的損失率
plt.plot(fc_history.history['val_loss'])
plt.plot(avg_history.history['val_loss'])
plt.plot(max_history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['FC', 'AVG', 'MAX'], loc='upper right')
plt.grid(True)
plt.show()

　　各自執行50次迭代後，我們看看準確度比較：

　　再看看損失度比較：

　　可以看出，首先GMP在本模型中表現太差，不值一提；而FC在前40次迭代時表現尚可，但到了40次後發生了劇烈變化，出現了過擬合現象（執行20次左右時的模型相對較好，但準確率不足70%，模型還是很差）；三者中表現最好的是GAP，無論從準確度還是損失率，表現都較為平穩，抗過擬合化效果明顯（但最終的準確度70%，模型還是不行）。

　　我們再轉向另一個模型InceptionV3，程式碼稍加改動如下：

import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.layers import Dense, Flatten
from matplotlib import pyplot as plt
import numpy as np

# 為保證公平起見，使用相同的隨機種子
np.random.seed(7)
batch_size = 32
# 迭代50次
epochs = 50
# 依照模型規定，圖片大小被設定為224
IMAGE_SIZE = 224
# 17種花的分類
NUM_CLASSES = 17
TRAIN_PATH = '/home/hutao/Documents/tensorflow/images/17flowerclasses/train'
TEST_PATH = '/home/hutao/Documents/tensorflow/images/17flowerclasses/test'
FLOWER_CLASSES = ['Bluebell', 'ButterCup', 'ColtsFoot', 'Cowslip', 'Crocus', 'Daffodil', 'Daisy',
                  'Dandelion', 'Fritillary', 'Iris', 'LilyValley', 'Pansy', 'Snowdrop', 'Sunflower',
                  'Tigerlily', 'tulip', 'WindFlower']


def model(mode='fc'):
    if mode == 'fc':
        # FC層設定為含有512個引數的隱藏層
        base_model = InceptionV3(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='none')
        x = base_model.output
        x = Flatten()(x)
        x = Dense(512, activation='relu')(x)
        prediction = Dense(NUM_CLASSES, activation='softmax')(x)
    elif mode == 'avg':
        # GAP層通過指定pooling='avg'來設定
        base_model = InceptionV3(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='avg')
        x = base_model.output
        prediction = Dense(NUM_CLASSES, activation='softmax')(x)
    else:
        # GMP層通過指定pooling='max'來設定
        base_model = InceptionV3(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), include_top=False, pooling='max')
        x = base_model.output
        prediction = Dense(NUM_CLASSES, activation='softmax')(x)

    model = Model(input=base_model.input, output=prediction)
    model.summary()
    opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
    model.compile(loss='categorical_crossentropy',
                             optimizer=opt,
                             metrics=['accuracy'])

    # 使用資料增強
    train_datagen = ImageDataGenerator()
    train_generator = train_datagen.flow_from_directory(directory=TRAIN_PATH,
                                                        target_size=(IMAGE_SIZE, IMAGE_SIZE),
                                                        classes=FLOWER_CLASSES)
    test_datagen = ImageDataGenerator()
    test_generator = test_datagen.flow_from_directory(directory=TEST_PATH,
                                                      target_size=(IMAGE_SIZE, IMAGE_SIZE),
                                                      classes=FLOWER_CLASSES)
    # 執行模型
    history = model.fit_generator(train_generator, epochs=epochs, validation_data=test_generator)
    return history


fc_history = model('fc')
avg_history = model('avg')
max_history = model('max')


# 比較多種模型的精確度
plt.plot(fc_history.history['val_acc'])
plt.plot(avg_history.history['val_acc'])
plt.plot(max_history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Validation Accuracy')
plt.xlabel('Epoch')
plt.legend(['FC', 'AVG', 'MAX'], loc='lower right')
plt.grid(True)
plt.show()

# 比較多種模型的損失率
plt.plot(fc_history.history['val_loss'])
plt.plot(avg_history.history['val_loss'])
plt.plot(max_history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['FC', 'AVG', 'MAX'], loc='upper right')
plt.grid(True)
plt.show()

　　先看看準確度的比較：

　　再看看損失度的比較：

　　很明顯，在InceptionV3模型下，FC、GAP和GMP都表現很好，但可以看出GAP的表現依舊最好，其準確度普遍在90%以上，而另兩種的準確度在80～90%之間。

四、結論

　　從本實驗看出，在資料集有限的情況下，採用經典模型進行遷移學習時，GMP表現不太穩定，FC層由於訓練引數過多，更易導致過擬合現象的發生，而GAP則表現穩定，優於FC層。當然具體情況具體分析，我們拿到資料集後，可以在幾種方式中多訓練測試，以尋求最優解決方案。

深度學習基礎系列（十）| Global Average Pooling是否可以替代全連線層？

一、什麼是GAP？

二、 GAP在Keras中的定義

三、GAP VS GMP VS FC

四、結論

深度學習基礎系列（十）| Global Average Pooling是否可以替代全連線層？

深度學習基礎系列（十一）| Keras中影象增強技術詳解

深度學習基礎系列（二）| 常見的Top-1和Top-5有什麽區別？

深度學習基礎系列（六）| 權重初始化的選擇

深度學習基礎系列（九）| Dropout VS Batch Normalization? 是時候放棄Dropout了深度學習基礎系列（七）| Batch Normalization

深度學習基礎系列（一）| 一文看懂用kersa構建模型的各層含義（掌握輸出尺寸和可訓練引數數量的計算方法）

深度學習基礎系列（五）| 深入理解交叉熵函式及其在tensorflow和keras中的實現

深度學習基礎系列（七）| Batch Normalization

深度學習基礎系列（八）| 偏差和方差

深度學習基礎系列（九）| Dropout VS Batch Normalization? 是時候放棄Dropout了

深度學習基礎系列（四）之 sklearn SVM

深度學習基礎概念（二）（科普入門）

深度學習基礎概念（一）（科普入門）

[深度學習]RCNNs系列（2）RCNN介紹

機器學習基礎系列（2）——資料預處理

深度學習基礎--卷積--1*1的卷積核與全連線的區別

JAVA基礎學習之路（十）this關鍵字

深度學習基礎2（反向傳播演算法）

java基礎鞏固系列（十）：String、StringBuffer、StringBuilder的使用與比較

深度學習物體檢測（九）——物件檢測YOLO系列總結

深度學習基礎系列（十）| Global Average Pooling是否可以替代全連線層？

一、什麼是GAP？

二、 GAP在Keras中的定義

三、GAP VS GMP VS FC

四、結論

相關推薦