1. 程式人生 > >基於BiLinear的VGG16+ResNet50,用於細粒度影象分類

基於BiLinear的VGG16+ResNet50,用於細粒度影象分類

細粒度視覺識別之雙線性CNN模型

[1] Lin T Y, RoyChowdhury A, Maji S. Bilinear cnn models for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1449-1457. [2] Lin T Y, RoyChowdhury A, Maji S. Bilinear CNNs for Fine-grained Visual Recognition//arXiv. 2017.

摘要

  • 定義
    :雙線性CNN模型:包含兩個特徵提取器,其輸出經過外積(外積WiKi)相乘、池化後獲得影象描述子。
  • 優點
    • 該架構能夠以平移不變的方式,對區域性的對級(pairwise)特徵互動進行建模,適用於細粒度分類。
    • 能夠泛化多種順序無關的特徵描述子,如Fisher 向量,VLAD及O2P。實驗中使用使用卷積神經網路的作為特徵提取器的雙線性模型。
    • 雙線性形式簡化了梯度計算,能夠對兩個網路在只有影象標籤的情況下進行端到端訓練。
  • 實驗結果
    • 對ImageNet資料集上訓練的網路進行特定領域的微調,該模型在CUB200-2011資料集上,訓練時達到了84.1%的準確率。
    • 作者進行了實驗及視覺化以分析微調的效果,並在考慮模型速度和精確度的情況下選擇了兩路網路。
    • 結果顯示,該架構在大多數細粒度資料集上都可以與先前演算法相媲美,並且更加簡潔、易於訓練。更重要的是,準確率最高的模型可以在NVIDIA Tesla K40 GPU上以8 f/s的速度高效執行。程式碼連結:http://vis-www.cs.umass.edu/bcnn

介紹

  • 細粒度識別 對同屬一個子類的物體進行分類,通常需要對高度區域性化、且與影象中姿態及位置無關的特徵進行識別。例如,“加利福尼亞海鷗”與“環狀海鷗”的區分就要求對其身體顏色紋理,或羽毛顏色的微細差異進行識別。 通常的技術分為兩種:

    • 區域性模型:先對區域性定位,之後提取其特徵,獲得影象特徵描述。缺陷:外觀通常會隨著位置、姿態及視角的改變的改變。
    • 整體模型:直接構造整幅影象的特徵表示。包括經典的影象表示方式,如Bag-of-Visual-Words,及其適用於紋理分析的多種變種。 基於CNN的區域性模型要求對訓練影象區域性標註,代價昂貴,並且某些類沒有明確定義的區域性特徵,如紋理及場景。
  • 作者思路

    • 區域性模型高效性的原因:本文中,作者聲稱區域性推理的高效性在於其與物體的位置及姿態無關。紋理表示通過將影象特徵進行無序組合的設計,而獲得平移無關性
    • 紋理表徵效能不佳的思考:基於SIFT及CNN的紋理表徵已經在細粒度物體識別上顯示出高效性,但其效能還亞於基於區域性模型的方法。其可能原因就是紋理表示的重要特徵並沒有通過端到端訓練獲得,因此在識別任務中沒有達到最佳效果。
    • 洞察點:某些廣泛使用的紋理表徵模型都可以寫作將兩個合適的特徵提取器的輸出,外積之後,經池化得到。
    • 首先,(影象)先經過CNNs單元提取特徵,之後經過雙線性層及池化層,其輸出是固定長度的高維特徵表示,其可以結合全連線層預測類標籤。最簡單的雙線性層就是將兩個獨立的特徵用外積結合。這與影象語義分割中的二階池化類似。
  • 實驗結果:作者在鳥類飛機汽車等細粒度識別資料集上對模型效能進行測試。表明B-CNN效能在大多細粒度識別的資料集上,都優於當前模型,甚至是基於區域性監督學習的模型,並且相當高效。

# -*- coding: utf-8 -*-
"""
Created on Tue Sep 18 00:28:01 2018

@author: Administrator
"""

import matplotlib.pyplot as plt
from keras.applications.inception_v3 import InceptionV3
from keras.applications.resnet50 import ResNet50
from keras.applications.vgg16 import VGG16
from keras.models import Sequential
from keras.models import Model
from keras.utils import np_utils
from keras.layers import Convolution2D,Activation,MaxPooling2D,Flatten,Dense,Dropout,Input,Reshape,Lambda
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
from keras import backend as K
from keras.utils import plot_model
import numpy as np
from keras.callbacks import ModelCheckpoint,EarlyStopping,LearningRateScheduler,ReduceLROnPlateau

def sign_sqrt(x):
    return K.sign(x) * K.sqrt(K.abs(x) + 1e-10)

def l2_norm(x):
    return K.l2_normalize(x, axis=-1)

def batch_dot(cnn_ab):
    return K.batch_dot(cnn_ab[0], cnn_ab[1], axes=[1, 1])

def bilinearnet():
    input_tensor = Input(shape=(384,512,3))
    vgg16 = VGG16(weights='imagenet', include_top=False,input_tensor=input_tensor)
#    conv2048 = Convolution2D(filters=2048,kernel_size=(3,3),)
#    vgg16_add_conv_to_2048 = Model(inputs=input_tensor,outputs=)
    resnet50 = ResNet50(weights='imagenet', include_top=False,input_tensor=input_tensor)
    model_vgg16 = Model(inputs=input_tensor,outputs=vgg16.output)
    model_resnet50 = Model(inputs=input_tensor,outputs=resnet50.output)
    model_vgg16.compile(loss='categorical_crossentropy',optimizer='adam')
    model_resnet50.compile(loss='categorical_crossentropy',optimizer='adam')
    
    resnet50_x = Reshape([model_resnet50.layers[-6].output_shape[1]*model_resnet50.layers[-6].output_shape[2],model_resnet50.layers[-6].output_shape[3]])(model_resnet50.layers[-6].output)
    vgg16_x = Reshape([model_vgg16.layers[-1].output_shape[1]*model_vgg16.layers[-1].output_shape[2],model_vgg16.layers[-1].output_shape[3]])(model_vgg16.layers[-1].output)
    
    cnn_dot_out = Lambda(batch_dot)([vgg16_x,resnet50_x])
    
    sign_sqrt_out = Lambda(sign_sqrt)(cnn_dot_out)
    l2_norm_out = Lambda(l2_norm)(sign_sqrt_out)
    flatten = Flatten()(l2_norm_out)
    dropout = Dropout(0.5)(flatten)
    output = Dense(12, activation='softmax')(dropout)
    
    model = Model(input_tensor, output)
    model.compile(loss='categorical_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9, decay=1e-6),
                  metrics=['accuracy'])
    print(model.summary())
    plot_model(model,to_file='vgg_resnet_bilinear_model.png')
    return model
#    print(vgg16_x.shape)
#    print(resnet50_x.shape)
#    print(cnn_dot_out.shape)
#    print(model_vgg16.summary())
    '''
    vgg16
        
    block5_conv1 (Conv2D)        (None, 24, 32, 512)       2359808   
    _________________________________________________________________
    block5_conv2 (Conv2D)        (None, 24, 32, 512)       2359808   
    _________________________________________________________________
    block5_conv3 (Conv2D)        (None, 24, 32, 512)       2359808   
    _________________________________________________________________
    block5_pool (MaxPooling2D)   (None, 12, 16, 512)       0         
    
    resnet50
        
    bn5c_branch2c (BatchNormalizati (None, 12, 16, 2048) 8192        res5c_branch2c[0][0]             
    __________________________________________________________________________________________________
    add_112 (Add)                   (None, 12, 16, 2048) 0           bn5c_branch2c[0][0]              
                                                                     activation_340[0][0]             
    __________________________________________________________________________________________________
    activation_343 (Activation)     (None, 12, 16, 2048) 0           add_112[0][0]                    
    __________________________________________________________________________________________________
    '''
#    print(model_resnet50.summary())
#    vgg16.layers[]


#
#def categorical_crossentropy(y_true, y_pred):
#    return K.categorical_crossentropy(y_true, y_pred)
#
#
#model = VGG16(weights='imagenet', include_top=False,input_shape=(384, 512,3))
##print('youdianmeng')
#top_model = Sequential()
#top_model.add(Flatten(input_shape=model.output_shape[1:]))  # model.output_shape[1:])  
#top_model.add(Dropout(0.5))  
#top_model.add(Dense(12, activation='softmax'))  
## 載入上一模型的權重  
#
#ftvggmodel = Model(inputs=model.input, outputs=top_model(model.output))
##for layer in ftvggmodel.layers[:25]:
##    layer.trainable=True
#    
#ftvggmodel.compile(loss=categorical_crossentropy,
#              optimizer=optimizers.SGD(lr=1e-4, momentum=0.90,decay=1e-5),metrics=['accuracy'])
##              
train_data_gen = ImageDataGenerator(rescale=1/255.,
                 samplewise_center=True,
                 samplewise_std_normalization=True,
#                 zca_whitening=True,
#                 zca_epsilon=1e-6,
                
                 width_shift_range=0.05,
                 height_shift_range=0.05,
                 fill_mode='reflect',
                 horizontal_flip=True,
                 vertical_flip=True)     
         
test_data_gen = ImageDataGenerator(rescale=1/255.)
#
train_gen = train_data_gen.flow_from_directory(directory='D:\\xkkAI\\ZZN\\guangdong\\train',
                            target_size=(384, 512), color_mode='rgb',
                            class_mode='categorical',
                            batch_size=5, shuffle=True, seed=222
                            )

val_gen = test_data_gen.flow_from_directory(directory='D:\\xkkAI\\ZZN\\guangdong\\val',
                            target_size=(384, 512), color_mode='rgb',
                            class_mode='categorical',
                            batch_size=5, shuffle=True, seed=222
                            )
test_gen = test_data_gen.flow_from_directory(directory='D:\\xkkAI\\ZZN\\guangdong\\test',
                            target_size=(384, 512), color_mode='rgb',
                            class_mode='categorical',
                            batch_size=5
                            )
cp = ModelCheckpoint('guangdong_best_vgg16.h5', monitor='val_loss', verbose=1,
                 save_best_only=True, save_weights_only=False,
                 mode='auto', period=1)
es = EarlyStopping(monitor='val_loss',
                  patience=8, verbose=1, mode='auto') 
lr_reduce = ReduceLROnPlateau(monitor='val_loss', factor=0.1, epsilon=1e-5, patience=2, verbose=1, min_lr = 0.00000001)    
callbackslist = [cp,es,lr_reduce]

ftvggmodel = bilinearnet()
ftvggmodel.fit_generator(train_gen,
                          epochs=1111,
                          verbose=1,
                          callbacks=callbackslist,
                          validation_data=val_gen,
                          shuffle=True)

#ftvggmodel.load_weights('guangdong_best_vgg16.h5')
pred = ftvggmodel.predict_generator(test_gen)

defectlist=['norm','defect1','defect2','defect3','defect4','defect5','defect6','defect7','defect8','defect9','defect10','defect11']
import csv
with open('lvcai_result.csv','w') as f:
    w = csv.writer(f)
    for i in range(len(pred)):
        w.writerow([str(i)+'.jpg',defectlist[np.argmax(pred[i])]])