1. 程式人生 > >FCN筆記(Fully Convolutional Networks for Semantic Segmentation)

FCN筆記(Fully Convolutional Networks for Semantic Segmentation)

width height training 註意 die str 指標 his repl

FCN筆記(Fully Convolutional Networks for Semantic Segmentation)

技術分享圖片技術分享圖片

(1)FCN做的主要操作

(a)將之前分類網絡的全連接層都換成卷積層,

  FCN將全連接層換成了卷積層,最後可以生成一個heatmap。卷積層的大小即為 (1,1,4096)、(1,1,4096)、(1,1,1000)。FCN在做前向和後向計算時,都比之前的方法要快,FCN生成一個10*10的結果,需要22ms,而之前的方法生個1個結果,就需要1.2ms,如果是100個結果,就需要120ms,所以FCN更快。使用了全卷積層之後,對輸入圖片的規格大小就沒有要求了。

技術分享圖片

技術分享圖片

(b)使用上采樣操作,(c)並且將這些特征圖進行上采樣之後,將特征圖連接起來,

  因為經過多次卷積和pooling之後,得到的圖像越來越小,分辨率越來越低,FCN為了得到信息,使用上采樣(使用deconvolution)實現尺寸還原。不僅對pool5之後的特征圖進行了還原,也對pool4和pool3之後的特征圖進行了還原,結果表明,從這些特征圖能很好的獲得關於圖片的語義信息,而且隨著特征圖越來越大,效果越來越好。

(2)語義分割中的評價指標

具體內容:深度學習之語義分割中的度量標準(準確度)(pixel accuracy, mean accuracy, mean IU,frequency weighted IU)

技術分享圖片

關於patch wise training and fully convolutional training

stackoverflow中的回答

The term "Fully Convolutional Training" just means replacing fully-connected layer with convolutional layers so that the whole network contains just convolutional layers (and pooling layers).

The term "Patchwise training" is intended to avoid the redundancies of full image training. In semantic segmentation, given that you are classifying each pixel in the image, by using the whole image, you are adding a lot of redundancy in the input. A standard approach to avoid this during training segmentation networks is to feed the network with batches of random patches (small image regions surrounding the objects of interest) from the training set instead of full images. This "patchwise sampling" ensures that the input has enough variance and is a valid representation of the training dataset (the mini-batch should have the same distribution as the training set). This technique also helps to converge faster and to balance the classes. In this paper, they claim that is it not necessary to use patch-wise training and if you want to balance the classes you can weight or sample the loss. In a different perspective, the problem with full image training in per-pixel segmentation is that the input image has a lot of spatial correlation. To fix this, you can either sample patches from the training set (patchwise training) or sample the loss from the whole image. That is why the subsection is called "Patchwise training is loss sampling". So by "restricting the loss to a randomly sampled subset of its spatial terms excludes patches from the gradient computation." They tried this "loss sampling" by randomly ignoring cells from the last layer so the loss is not calculated over the whole image.

最後的效果

技術分享圖片

技術分享圖片

技術分享圖片

技術分享圖片

缺點(原文連接)

在這裏我們要註意的是FCN的缺點:

  1. 是得到的結果還是不夠精細。進行8倍上采樣雖然比32倍的效果好了很多,但是上采樣的結果還是比較模糊和平滑,對圖像中的細節不敏感。
  2. 是對各個像素進行分類,沒有充分考慮像素與像素之間的關系。忽略了在通常的基於像素分類的分割方法中使用的空間規整(spatial regularization)步驟,缺乏空間一致性。





FCN筆記(Fully Convolutional Networks for Semantic Segmentation)