1. 程式人生 > >Semantic Segmentation記錄(個人)

Semantic Segmentation記錄(個人)


Under construction!

Table of Contents

Deep Learning Methods

Semantic Segmentation

FCN ★★★

[Paper] Learning a Deep Convolutional Network for Image Super-Resolution

[Year] CVPR 2015

[Authors] Evan Shelhamer, Jonathan Long, Trevor Darrell


https://github.com/shelhamer/fcn.berkeleyvision.org (official)

https://github.com/MarvinTeichmann/tensorflow-fcn (tensorflow)

https://github.com/wkentaro/pytorch-fcn (pytorch)


1) 首篇(?)使用end-to-end CNN實現Semantic Segmentation,文中提到FCN與提取patch逐畫素分類是等價的,但FCN中相鄰patch間可以共享計算,因此大大提高了效率
2) 把全連線視為一種卷積
3) 特徵圖通過deconvolution(初始為bilinear interpolation)上取樣,恢復為原來的解析度
4) 使用skip connection改善coarse segmentation maps

U-Net ★

[Paper] U-Net: Convolutional Networks for Biomedical Image Segmentation

[Year] MICCAI 2015

[Authors] Olaf Ronneberge, Philipp Fischer, Thomas Brox





1) encoder-decoder結構,encode設計參考的是FCN,decode階段將encode階段對應的特徵圖與up-conv的結果concat起來
2) 用於醫學影象分割,資料集小,因此做了很多data augmentation,網路結構也較為簡單

zoom-out ★

[Paper] Feedforward semantic segmentation with zoom-out features

[Year] CVPR 2015

[Authors] Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich

[Pages] https://bitbucket.org/m_mostajabi/zoom-out-release


1) 以超畫素為最小單位,逐步zoom out提取更大尺度的資訊,zoom out特徵是從CNN不同層提取的特徵得到的
2) 特徵在超畫素的範圍內進行average pooling,並concat不同level的特徵得到該超畫素最後的特徵向量。用樣本集中每一類出現頻率的倒數加權loss。

Dilated Convolution★

[Paper] Multi-Scale Context Aggregation By Dilated Convolutions

[Year] ICLR 2016

[Authors] Fisher Yu , Vladlen Koltun

[Pages] https://github.com/fyu/dilation


1) 系統使用了dilated convulution,其實現已被Caffe收錄

DeepLab ★★

[Paper] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

[Year] ICLR 2015

[Authors] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille

[Pages] https://bitbucket.org/deeplab/deeplab-public


1) 在保證感受野大小的同時,輸出dense feature。做法是把VGG16後兩個pool stride設定為1,用Hole演算法(也就是Dilation卷積)控制感受野範圍
2) 輸出用全域性CRF後處理,一元項為pixel的概率,二元項為當前pixel與影象中除自己外的每個pixel的相似度,考慮顏色和位置,使用高斯核。全連線CRF參考Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
3) 與FCN相似,也使用了多尺度預測

[Paper] Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation

[Year] ICCV 2015

[Authors] George Papandreou, Liang-Chieh Chen, Kevin Murphy, Alan L. Yuille

DeepLab-V2 ★

[Paper] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

[Year] arXiv 2016

[Authors] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille



https://github.com/DrSleep/tensorflow-deeplab-resnet (tensorflow)

https://github.com/isht7/pytorch-deeplab-resnet (pytorch)


1) 與V1相比的不同是:不同的學習策略,多孔空間金字塔池化(ASPP),更深的網路和多尺度。ASPP就是使用不同stride的dilated conv對同一特徵圖進行處理

DeepLab-V3 ☆

[Paper] Rethinking Atrous Convolution for Semantic Image Segmentation

[Year] arXiv 1706

[Authors] Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam

[Pages] https://github.com/tensorflow/models/tree/master/research/deeplab


1) 使用串聯和並行的atrous cov,使用bn,結構優化,達到了soa的精度(080116)

DeepLab-V3+ ★☆

[Paper] Rethinking Atrous Convolution for Semantic Image Segmentation

[Year] arXiv 2017

[Authors] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam

[Pages] https://github.com/tensorflow/models/tree/master/research/deeplab


1) 在DeepLab-V3作為encoder的基礎上, 加入了一個簡單的decoder, 而不是直接上取樣; 採用Xception作為backbone
2) VOC上分割任務達到soa (0800314), 效果好


[Paper] Conditional Random Fields as Recurrent Neural Networks

[Year] ICCV 2015

[Authors] Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr

[Pages] http://www.robots.ox.ac.uk/~szheng/CRFasRNN.html


1) 將CRF推斷步驟用卷積, softmax等可微模組替代, 並使用RNN的遞迴迭代, 將CRF用類似RNN的結構近似. 整個模型都可以end-to-end的優化.
2) 全連線CRF及其推斷是在Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials的基礎上設計的. 待深入研究CRF後應再仔細閱讀這篇paper.

DeconvNet ★

SegNet ★★

[Paper] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling

[Year] arXiv 2015

[Authors] Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla

[Pages] http://mi.eng.cam.ac.uk/projects/segnet/

[Paper] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

[Year] PAMI 2017

[Authors] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla


1) encoder-decoder的代表模型之一,特點是將encoder中的pooling indices儲存下來,decoder上取樣時用這些indices得到sparse feature map,再用trainable conv得到dense feature map

Piecewise CRF

[Paper] Efficient piecewise training of deep structured models for semantic segmentation

[Year] CVPR 2016

[Authors] Guosheng Lin, Chunhua Shen, Anton van dan Hengel, Ian Reid



1) 粗讀. CRF部分沒怎麼看懂.
2) FeatMap-Net接受multi-scale的輸入, 生成feature map; 基於feature map設計了CRF的unary和pairwise potential, pairwise中考慮了surrounding和above/below兩種context.
3) CRF training提出了基於piecewise learning的方法.

ENet ★

[Paper] ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

[Year] arXiv 1606

[Authors] Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello

[Pages] https://github.com/e-lab/ENet-training


1) 一種快速的encoder-decoder分割網路
2) 大encoder,小decoder; PReLU代替ReLU; 1xn和nx1卷積代替nxn卷積

ParseNet ★

[Paper] ParseNet: Looking Wider to See Better

[Year] ICLR 2016

[Authors] Wei Liu, Andrew Rabinovich, Alexander C. Berg

[Pages] https://github.com/weiliu89/caffe/tree/fcn


1) 一種簡單的加入global context的方法. 將feature map進行global pooling和L2 norm, 將得到的向量unpool成與原feature map相同尺寸, 再concatenate到也進行了L2 norm的feature map上.
2) 通過簡單實驗, 提出實際感受野往往遠小於理論感受野. 很多paper都引用了這一類觀點, 但是感覺缺乏理論論證-_-||

FoveaNet ★★

[Paper] FoveaNet: Perspective-aware Urban Scene Parsing

[Year] ICCV 2017 Oral

[Authors] Xin Li, Zequn Jie, Wei Wang, Changsong Liu, Jimei Yang, Xiaohui Shen, Zhe Lin, Qiang Chen, Shuicheng Yan, Jiashi Feng



1) 提出了一種perspective-aware parsing network, 以解決 heterogeneous object scales問題, 提高遠處小物體的分割精度, 減少近處大物體的”broken-down”現象.
2) 為更好解析接近vanishing point(即遠離成像平面處)的物體, 提出了perspective estimation network(PEN). 通過PEN得到距離的heatmap, 根據heatmap得到包含大多數小目標的fovea region. 將fovea region放大, 與原圖並行地送入網路解析. 解析出來的結果再放回原圖.
3) 為解決近處目標的”broken-down”問題, 提出了perspective-aware CRF. 結合PEN得到的heatmap和目標檢測, 使屬於近處目標的畫素有更大的pairwise potential, 屬於遠處目標的畫素有更小的parwise potential, 有效緩解了”broken-down”和過度平滑的問題.

PSPNet ★☆

[Paper] Pyramid Scene Parsing Network

[Year] CVPR 2017

[Authors] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia

[Pages] https://hszhao.github.io/projects/pspnet/


1) 提出了pyramid pooling module結合不同尺度的context information。PSPNet把特徵圖進行不同尺度的pooling(類似spatial pyramid pooling),再將所有尺度的輸出scale到相同尺寸,並concat起來
2) 再res4b22後接了一個auxiliary loss,使用resnet網路結構

RefineNet ★☆

[Paper] RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

[Year] CVPR 2017

[Authors] Xiaohang Zhan, Ziwei Liu, Ping Luo , Xiaoou Tang, Chen Change Loy

[Pages] https://github.com/guosheng/refinenet


1) encoder為4組殘差塊, 逐漸降低解析度; decoder部分為論文提出的RefineNet. 作者認為提出的模型對高解析度影象的細節資訊有更好的分辨能力;
2) RefineNet前半部分為multi-resolution fusion, 類似於UNet, decoder的每一級模組都利用了對應的encoder模組的資訊;
3) RefineNet後半部分為Chained residual pooling, 目的是”capture background context from a large image region”.


[Paper] Large Kernel Matters—— Improve Semantic Segmentation by Global Convolution

[Year] CVPR 2017

[Authors] Peng Chao, Xiangyu Zhang Gang Yu, Guiming Luo, Jian Sun



1) 文章認為, segmentation包括localization和classification兩部分, 分類需要全域性資訊, localization需要保證feature map的解析度以保證空間準確度, 因此二者存在矛盾. 本文提出的解決辦法就是用large kernel, 既可以保持解析度, 又能近似densely connections between feature maps and per-pixel classifiers;

2) 文中使用k*1+1*k和1*k+k*1代替k*k的大kernel. 引入boundary refinement模組, 使用殘差結構, 捕捉邊界資訊;

3) 只根據實驗說明提出的模型由於k*k kernel和多個小kernel堆疊的策略, 但是並沒什麼理論支援;

4) 一點不明白: 為什麼提出的基於殘差結構的BR可以model the boundary alignment?

PixelNet ★

[Paper] Representation of the pixels, by the pixels, and for the pixels

[Year] TPAMI 2017

[Authors] Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

[Pages] http://www.cs.cmu.edu/~aayushb/pixelNet/


1) 粗讀. 使用hypercolumn思想, 速度快. 適用於segmentation, 邊緣檢測, normal estimation等low-level到high-level的多種問題.
2) hypercolumn即: 對於一個pixel, 將每一層feature map中其對應位置的feature連線起來組成一個vector, 用MLP對該vector分類.
3) 文中提出, 訓練時 just sampling a small number of pixels per image is sufficient for learning. 這樣一個mini-batch裡就可以從多張圖片中取樣, 增加了diversity.

LinkNet ☆

[Paper] LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation

[Year] arXiv 1707

[Authors] Abhishek Chaurasia, Eugenio Culurciello

[Pages] https://codeac29.github.io/projects/linknet/


1) 還沒讀, 大致是一個類似U-Net的結構, 速度快


[Paper] Stacked Deconvolutional Network for Semantic Segmentation

[Year] arXiv 1708

[Author] Jun Fu, Jing Liu, Yuhang Wang, Hanqing Lu



1) 粗讀. 效果好, 未開源.
2) 以DenseNet為基礎, 構建了stacked的encoder-decoder模型, 論文中認為這能更好的捕捉multi-scale context. 網路充滿了inter和intra的unit connections, 並加入了hierarchical supervisions, 使非常深的SDN能夠成功訓練.

Weakly Supervision

Image-level to Pixel-level Labeling

[Paper] From Image-level to Pixel-level Labeling with Convolutional Networks

[Year] CVPR 2015

[Authors] Pedro O. Pinheiro, Ronan Collobert



1) 一種weakly supervised方法,用影象類別標籤訓練分割模型,分割中每個類別的特徵圖用log-sum-exp變換為分類任務中每個類別的概率,通過最小化分類的loss優化分割模型
2) 推斷時為抑制False Positive現象,使用了兩種分割先驗:Image-Level Prior(分類概率對分割加權)和Smooth Prior(超畫素,bounding box candidates,無監督分割MCG)。

BoxSup ★

[Paper] BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

[Year] ICCV 2015

[Authors] Jifeng Dai, Kaiming He, Jian Sun



1) 弱監督語義分割,用bounding box結合region proposal(MCG)生成初始groundtruth mask,再交替更新分割結果和mask.


Mix-and-Match ★

[Paper] Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

[Year] AAAI 2018

[Authors] Xiaohang Zhan, Ziwei Liu, Ping Luo, Xiaoou Tang, Chen Change Loy

[Pages] http://mmlab.ie.cuhk.edu.hk/projects/M&M/


1) self-supervision可分為proxy stage和fine-tuning stage兩個階段. 先用無需標籤資料的proxy task(如影象上色)進行預訓練, 學到某種語義特徵, 再用少量的標記資料進行微調. 但由於proxy task和target task之間存在semantic gap, 自監督方法效能明顯較監督方法差.
2) 論文提出了”mix-and-match”策略, 利用少數標記資料提升自監督預訓練網路的效能. mix step: 從不同影象中隨機提取patch. match step: 在訓練時通過on-the-fly的方式構建graph, 並生成triplet, triplet包括anchor , positive, negative patch三個元素. 據此可定義一triplet loss, 鼓勵相同類別的patch更相似, 不同類別的patch差別更大.
3) 對自監督瞭解不夠深入, 看程式碼有助理解. segmentation部分採用的hypercolumn方法論文中貌似沒仔細說, 以後可以再研究研究.

Other Interesting Methods


[Paper] Convolutional Oriented Boundaries

[Year] ECCV 2016

[Author] K.K. Maninis, J. Pont-Tuset, P. Arbeláez, L.Van Gool

[Pages] http://www.vision.ee.ethz.ch/~cvlsegmentation/cob/index.html


1) 由邊緣概率得到分割結果, 整體流程來自伯克利的gPb-owt-ucm, 將前面得到概率圖的部分用CNN代替
2) CNN部分使用多尺度模型預測coarse和fine的8方向的概率
3) UCM部分提出了sparse boundaries representation, 加快了速度

Traditional Classical Methods

gPb-owt-ucm ★★★





