【Mask RCNN】《Mask R-CNN》

阿新 • • 發佈：2018-10-31

ICCV-2017

Extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. 做到一個模型，三種用途 instance segmentation, bounding-box object detection, and person keypoint detection
提出了 RoIAlign 彌補 Faster R-CNN 的 end-to-end align for instance segmentation

3 Advantages

instance segmentation, bounding-box object detection, and person keypoint detection 三合一，且效果比各自單項冠軍（2016 COCO）好

4 Methods

4.1 Head Architeture

左邊的結構不好，R-FCN這邊論文一開始就說了（this creates a deeper RoI-wise subnetwork that improves accuracy, at the cost of lower speed due to the unshared per-RoI computation. 類似RCNN的感覺，提出proposal後，對每個proposal進行後續處理），作者也推薦用右邊的結構（we do not recommend using the C4 variant in practice）

Faster R-CNN has two outputs for each candidate object, a class label and a bounding-box offset，作者加了第三個 branch，讓網路 output object mask. 但是第三個 branch requiring extraction of much finer spatial layout of an object.

Mask R-CNN also outputs a binary mask for each RoI

4.2 RoI Align

做segment是pixel級別的，但是faster rcnn中roi pooling有2次量化操作導致了沒有對齊 ,兩次量化，第一次 roi 對映 feature map 時，第二次 roi pooling 時 1

量化（quantization）如下

RoI pooling、warp、align 的區別如下：

RoI Align 詳解如下圖中間部分

取非量化後 RoI 中的四個點，用雙線性差值（周圍四個畫素點）確定其畫素值，然後四個加起來求平均

4.3 Train

Multi-task loss： $L = L_{cls} + L_{box} + L_{mask}$ ， $L_{mask}$ is defined only on positive RoIs

mask branch has a $Km^2$ dimensional output for each RoI，K classes and m×m resolution
RoI 的 positive IoU at least 0.5 and negative otherwise
如同 fast rcnn 一樣，採用 image-centric sampling 而不是 RoI centric sampling 來訓練
- RoI-centric sampling：從所有圖片的所有RoI中均勻取樣，這樣每個SGD的mini-batch中包含了不同影象中的樣本。（SPPnet採用）
- image-centric sampling： (solution)mini-batch採用層次取樣，先對影象取樣，再對RoI取樣，同一影象的RoI共享計算和記憶體2。
Each mini-batch has 2 images per GPU and each image has N sampled RoIs，positive：negative = 1：3，N = 64 for C4 backbone and 512 for FPN（見圖3）
RPN anchors 5 scales and 3 aspect ratios

4.4 Inference

Proposal = 300 for C4，and 1000 for FPN，然後丟到 box prediction branch，接NMS
Mask branch applied to the highest scoring 100 detection boxes，與訓練的時候不同，但是加速
Mask branch 能預測 K 個 masks per RoI，但是隻用 k-th mask，k是 classification branch 的結果
Mask 會 resize 到 RoI 的大小，二值化的 thresold 為0.5

5 Experiments：Instance Segmentation

evaluate using mask IoU

5.1 Main Results

outperform COCO2015、2016的 instance segmentation 冠軍

Mask RCNN VS FCIS，FCIS exhibits systematic artifacts on overlapping instances 而 Mask RCNN 沒有。

5.2 Ablation Experiments

Backbone：benefit from depth（50 vs 101），FPN and ResNeXt（表2 a）
Multinomial vs Independent Masks：簡單的說就是 sigmoid vs softmax，sigmod 是 class-specific 的，爭對每一類，二分類，而 softmax 是 class- agnostic，爭對每個畫素，用softmax 然後 multinomial logistic loss（表2 b，c）
RoIAlign：對 max還是average pooling insensitive，所以作者都採用的是average pooling，相對 RoI pooling 效果有明顯提升（表2 c，d），（c）的backbone 為 ResNet-50-C4，stride 為16，（d）中採用的是 ResNet-50-C5，stride 為 32，（d）比（c）的效果好，AP 30.9 vs 30.3，用FPN的話效果會進一步提升。
Mask branch： FCN 比 MLP （FC）好

5.3 Bounding Box Detection Results

注意到去掉mask 和加上mask的區別在於，solely due to the benefits of multi-task training

table1 中， instance segmentation 的 AP 為 37.1
This indicates that our approach largely closes the gap between object detection and the more challenging instance segmentation task.

5.4 Timing

our design is not optimized for speed

Mask R-CNN for Human Pose Estimation 以及 Experiments on Cityscapes（instance segmentation）這篇部落格就不在討論了，有興趣的可以去看下原文。

【Mask RCNN】《Mask R-CNN》

目錄

1 Motivation

2 Innovation

3 Advantages

4 Methods

4.1 Head Architeture

4.2 RoI Align

4.3 Train

4.4 Inference

5 Experiments：Instance Segmentation

5.1 Main Results

5.2 Ablation Experiments

5.3 Bounding Box Detection Results

5.4 Timing

參考

【目標檢測】Cascade R-CNN 論文解析

【論文解析】Cascade R-CNN: Delving into High Quality Object Detection

【論文翻譯】Faster R-CNN

【論文翻譯】Fast R-CNN

【論文筆記】Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

【神經網路與深度學習】【計算機視覺】Fast R-CNN

【Mask RCNN】《Mask R-CNN》

【論文翻譯】Mask R-CNN

【目標檢測】【語義分割】—Mask-R-CNN詳解

【Faster RCNN】《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》

【目標檢測】Mask RCNN演算法詳解

【筆記】Faster-R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

【翻譯】Faster-R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

【譯】Cascade R-CNN：Delving into High Quality Object Detection論文翻譯

【效能評估】P-R曲線理解

【深度學習】8：CNN卷積神經網路與sklearn資料集實現數字識別

【讀書筆記】《R語言實戰》Day1

【ML--14】在R語言中使用SVM演算法做多分類預測

【faster-rcnn】訓練自己的資料集時的坑

【深度學習】5：CNN卷積神經網路原理、識別MNIST資料集

【Mask RCNN】《Mask R-CNN》

目錄

1 Motivation

2 Innovation

3 Advantages

4 Methods

4.1 Head Architeture

4.2 RoI Align

4.3 Train

4.4 Inference

5 Experiments：Instance Segmentation

5.1 Main Results

5.2 Ablation Experiments

5.3 Bounding Box Detection Results

5.4 Timing

參考

相關推薦