1. 程式人生 > >【R-FCN】《R-FCN: Object Detection via Region-based Fully Convolutional Networks》

【R-FCN】《R-FCN: Object Detection via Region-based Fully Convolutional Networks》

這裡寫圖片描述

NIPS-2016

目錄


1 Motivation

Faster RCNN two stage :fully convolutional subnetwork + RoI-wise subnetwork apply a costly per-region subnetwork hundreds of times.

為了進一步提速,作者用 fully convolutional with almost all computation shared on the entire image. 減少頭部的計算時間,

但是呢,如果簡單的把 fully connected(Alex、VGG)層換成 fully convolutional (ResNet、GoogleNet)會導致 inferior detection accuracy that does not match the network’s superior classification accuracy,這是因為 image-level classification task favors translation invariance

但是 object detection task needs localization representations that are translation-variant to an extent,會有衝突。

ResNet的做法是 creates a deeper RoI-wise subnetwork ,雖然提高了精度,但是速度有所下降 due to the unshared per-RoI computation.

本文的目的就要想辦法解決這種衝突

2 Innovation

在faster RCNN 的基礎上 propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection.

3 Advantages

  • region-based detector is fully convolutional with almost all computation shared on the entire image
  • 2.5-20× faster than the Faster R-CNN counterpart
  • 83.6% mAP on the PASCAL VOC 2007,82.0% the 2012

4 Methods

4.1 Structures

  • Backbone:ResNet101 = convolutional layers(100) + average pooling + 1000-class fc layer,作者只用100個convolutional layer來提取 feature map

  • Position-sensitive score maps & Position-sensitive RoI pooling

最後一層 convolutional layer 的 channel 設計為 k 2 C + 1 ,C是類別數量

我們把score maps 上的 RoI 分成 k*k 個bins

這裡寫圖片描述

ROI pooling 操作如下,

  • r c ( i , j ) is the pooled response in the ( i , j ) t h bin for the c t h category
  • Θ denotes all learnable parameters of the network
  • z i , j , c is one of score map out of the k 2 ( C + 1 ) score maps
  • ( x 0 , y 0 ) denotes the top-left corner of an RoI
  • n is the number of pixels in the bin

需要注意的是,pooling 後的第 i,j,c-th bin 是由 第 i,j,c-th score map對應的第 i.j bin 累加得到 ,也即是各類中每個 score maps 只 pooling 某一個bins,比如白色的 score maps 只對右下角部分的畫素進行操作。

這裡寫圖片描述

作者 address bbox regression in a similar way, feature map 通過 4 k 2 channel 的 conv 產生 4 k 2 channel 的 score maps,然後 producing a 4 k 2 -d vector for each RoI,接著 vote 成 4-d vector,表示 t x t y t w t h ,這種情況是 class-agnostic,對於 class-specific 來說, 4 k 2 C 即可。

4.2 train

4.3 inference

  • 300 RoI
  • non-maximum suppression:0.3

4.4 À trous and stride

ResNet 101 中, stage 1-4 unchanged,stage 5 中,把第一個 stride = 2 的conv 變成1,然後用 À trous to compensate for the reduced stride

這樣 ResNet stride from 32 to 16,提高了 score maps 的解析度

提升效果如下,2.6 個 點

這裡寫圖片描述

4.5 Visualization

如果每個bins 的響應都很高,vote 後的 score 會很高,反之,則 low

這裡寫圖片描述

5 Experiments

5.1 Experiments on PASCAL VOC

5.1.1 驗證結構的效果

  • training :07+12
  • test:07

這裡寫圖片描述

  • Standard Faster RCNN:76.4% mAP,RoI pooling 插入 stage4 和 stage5之間(見文章最後的 Faster RCNN-RES101 結構圖)
  • naive Faster RCNN:RoI pooling after conv5,an inexpensive 21-class fc layer is evaluated on each RoI,為了對比的公平,用了 À trous trick
  • class-specific RPN:RPN 的二分類(object or not)換成了 21 類,也用了À trous trick
  • R-FCN(without position-sensitivity)k = 1,相當於 global pooling within each RoI

5.1.2 與Faster RCNN pk,速度和精度,k*k = 7*7

  • 單技能釋放PK
    mining from a larger set of candidates has no benefit,所以作者採用 RoI = 300 (both training and inference)

    這裡寫圖片描述

  • 組合拳PK

    這裡寫圖片描述

Faster R-CNN +++ 就是 ResNet 論文裡面的 Box refinement + Global context + Multi-scale testing

5.1.3 Impact of Depth

這裡寫圖片描述

5.1.4 Impact of Region Proposals

這裡寫圖片描述

5.2 Experiments on MS COCO

R-FCN vs Faster RCNN 不打組合拳是 R-FCN厲害,組合後 Faster RCNN 厲害,但是 R-FCN 會快很多

這裡寫圖片描述


Faster RCNN-RES101 結構圖

這裡寫圖片描述

train.prototxt

# Enter your network definition here.
# Use Shift+Enter to update the visualization.
name: "ResNet-50"
layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info'
  top: 'gt_boxes'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 21"
  }
}

# ------------------------ conv1 -----------------------------
layer {
    bottom: "data"
    top: "conv1"
    name: "conv1"
    type: "Convolution"
    convolution_param {
        num_output: 64
        kernel_size: 7
        pad: 3
        stride: 2
    }
    param {
        lr_mult: 0.0
    }
    param {
        lr_mult: 0.0
    }

}

layer {
    bottom: "conv1"
    top: "conv1"
    name: "bn_conv1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "conv1"
    top: "conv1"
    name: "scale_conv1"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "conv1"
    top: "conv1"
    name: "conv1_relu"
    type: "ReLU"
}

layer {
    bottom: "conv1"
    top: "pool1"
    name: "pool1"
    type: "Pooling"
    pooling_param {
        kernel_size: 3
        stride: 2
        pool: MAX
    }
}

layer {
    bottom: "pool1"
    top: "res2a_branch1"
    name: "res2a_branch1"
    type: "Convolution"
    convolution_param {
        num_output: 256
        kernel_size: 1
        pad: 0
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "bn2a_branch1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "scale2a_branch1"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "pool1"
    top: "res2a_branch2a"
    name: "res2a_branch2a"
    type: "Convolution"
    convolution_param {
        num_output: 64
        kernel_size: 1
        pad: 0
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch2a"
    top: "res2a_branch2a"
    name: "bn2a_branch2a"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch2a"
    top: "res2a_branch2a"
    name: "scale2a_branch2a"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch2a"
    top: "res2a_branch2a"
    name: "res2a_branch2a_relu"
    type: "ReLU"
}

layer {
    bottom: "res2a_branch2a"
    top: "res2a_branch2b"
    name: "res2a_branch2b"
    type: "Convolution"
    convolution_param {
        num_output: 64
        kernel_size: 3
        pad: 1
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch2b"
    top: "res2a_branch2b"
    name: "bn2a_branch2b"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch2b"
    top: "res2a_branch2b"
    name: "scale2a_branch2b"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch2b"
    top: "res2a_branch2b"
    name: "res2a_branch2b_relu"
    type: "ReLU"
}

layer {
    bottom: "res2a_branch2b"
    top: "res2a_branch2c"
    name: "res2a_branch2c"
    type: "Convolution"
    convolution_param {
        num_output: 256
        kernel_size: 1
        pad: 0
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch2c"
    top: "res2a_branch2c"
    name: "bn2a_branch2c"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch2c"
    top: "res2a_branch2c"
    name: "scale2a_branch2c"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2a_branch1"
    bottom: "res2a_branch2c"
    top: "res2a"
    name: "res2a"
    type: "Eltwise"
}

layer {
    bottom: "res2a"
    top: "res2a"
    name: "res2a_relu"
    type: "ReLU"
}

layer {
    bottom: "res2a"
    top: "res2b_branch2a"
    name: "res2b_branch2a"
    type: "Convolution"
    convolution_param {
        num_output: 64
        kernel_size: 1
        pad: 0
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2b_branch2a"
    top: "res2b_branch2a"
    name: "bn2b_branch2a"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2b_branch2a"
    top: "res2b_branch2a"
    name: "scale2b_branch2a"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2b_branch2a"
    top: "res2b_branch2a"
    name: "res2b_branch2a_relu"
    type: "ReLU"
}

layer {
    bottom: "res2b_branch2a"
    top: "res2b_branch2b"
    name: "res2b_branch2b"
    type: "Convolution"
    convolution_param {
        num_output: 64
        kernel_size: 3
        pad: 1
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2b_branch2b"
    top: "res2b_branch2b"
    name: "bn2b_branch2b"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2b_branch2b"
    top: "res2b_branch2b"
    name: "scale2b_branch2b"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2b_branch2b"
    top: "res2b_branch2b"
    name: "res2b_branch2b_relu"
    type: "ReLU"
}

layer {
    bottom: "res2b_branch2b"
    top: "res2b_branch2c"
    name: "res2b_branch2c"
    type: "Convolution"
    convolution_param {
        num_output: 256
        kernel_size: 1
        pad: 0
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2b_branch2c"
    top: "res2b_branch2c"
    name: "bn2b_branch2c"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2b_branch2c"
    top: "res2b_branch2c"
    name: "scale2b_branch2c"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2a"
    bottom: "res2b_branch2c"
    top: "res2b"
    name: "res2b"
    type: "Eltwise"
}

layer {
    bottom: "res2b"
    top: "res2b"
    name: "res2b_relu"
    type: "ReLU"
}

layer {
    bottom: "res2b"
    top: "res2c_branch2a"
    name: "res2c_branch2a"
    type: "Convolution"
    convolution_param {
        num_output: 64
        kernel_size: 1
        pad: 0
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2c_branch2a"
    top: "res2c_branch2a"
    name: "bn2c_branch2a"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2c_branch2a"
    top: "res2c_branch2a"
    name: "scale2c_branch2a"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2c_branch2a"
    top: "res2c_branch2a"
    name: "res2c_branch2a_relu"
    type: "ReLU"
}

layer {
    bottom: "res2c_branch2a"
    top: "res2c_branch2b"
    name: "res2c_branch2b"
    type: "Convolution"
    convolution_param {
        num_output: 64
        kernel_size: 3
        pad: 1
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2c_branch2b"
    top: "res2c_branch2b"
    name: "bn2c_branch2b"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2c_branch2b"
    top: "res2c_branch2b"
    name: "scale2c_branch2b"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2c_branch2b"
    top: "res2c_branch2b"
    name: "res2c_branch2b_relu"
    type: "ReLU"
}

layer {
    bottom: "res2c_branch2b"
    top: "res2c_branch2c"
    name: "res2c_branch2c"
    type: "Convolution"
    convolution_param {
        num_output: 256
        kernel_size: 1
        pad: 0
        stride: 1
        bias_term: false
    }
    param {
        lr_mult: 0.0
    }
}

layer {
    bottom: "res2c_branch2c"
    top: "res2c_branch2c"
    name: "bn2c_branch2c"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {
    bottom: "res2c_branch2c"
    top: "res2c_branch2c"
    name: "scale2c_branch2c"
    type: "Scale"
    scale_param {
        bias_term: true
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
    param {
        lr_mult: 0.0
        decay_mult: 0.0
    }
}

layer {