kaggle 2018 data science bowl 細胞核分割學習筆記

阿新 • • 發佈：2018-11-19

一、獲獎者解決方案

1. 第一名解決方案（Unet 0.631）

主要的貢獻

targets: 預測touching borders，將問題作為instance分割
loss function：組合交叉熵跟soft dice loss，避免pixel imbalance問題

binary_crossentropy有類平衡問題，每個畫素作為單獨的一個來考慮。This makes predictions a bit fuzzy.

soft dice (and Jaccard) 對影象中的所有畫素點計算，因此預測結果有較好的shapes並且not fuzzy。不足是它們太confident了，probability is close to 0 or 1 even for wrong pixels.

兩者的組合是為了克服它們的不足。最簡單的組合是：loss = binary_crossentropy + (1 - dice); 也可以根據自己的資料加權： loss = w1 * binary_crossentropy + w2 * (1 - dice)

very deep encoder-decoder architectures that also achieve state-of-the-art results in other binary segmentation problems (SpaceNet, Inria and others)
後處理技巧：分水嶺，morphological features and second level model with Gradient Boosted Trees

基於任務的資料增廣（Clahe, Sharpen, Emboss，高斯噪聲，color2gray，模糊，中值模糊，高斯模糊，調整對比度和亮度，隨機scale,roation and flips）。比較有效的兩個增廣方法

Channel shuffle - I guess this one was very important due to the nature of the data

Nucleus copying on images. That created a lot of overlapping nuclei. It seemed to help networks to learn better borders for overlapping nuclei.
Whenever we reduced augmentations we got better validation results and worse score on public leaderboard.

Unet or Mask-RCNN？我們也參加了之前的一些分割比賽，選擇清晰了——Unet!

實驗歷程

首先嚐試最簡單的方法並增加watershed line for binary masks(i.e. modified GT masks to always have gaps between nuclei)，這個方法在public LB得到0.5分。

不是用的plain Unets，而是預訓練的非常深的編碼器（Resnet50不夠）。That makes a huge difference when you don't have enough data, which is clearly the case。

使用預訓練模型（只用來初始化，然後端到端訓練）非常重要，如果直接訓練一個plain Unet，結果會差很多。

Encoders were initialized with pretrained weights from ImageNet. Then we trained models end to end. From my experience with a frozen encoder it is usually not possible to achieve good segmentation results even on datasets that are more or less similar to ImageNet.

分水嶺後處理：取兩個閾值，high for seeds, low for masks (something like 0.6 and 0.3 for a binary mask).

然後增加第二個帶contours的通道，contours的寬度依賴於核的大小，這些masks + 簡單的分水嶺後處理得到0.525分，不是很大的突破，但是a hint to the right directino.

In my initial experiments, i actually tried Unet but i gave it up because i have problem in the post processing of unet outputs. Watershed is very sensitive to input marker and energy function (distance transform). That is why I switch to mask-rcnn.
@topcoders solution use unet to learn the "border of nuclei", which is made from watershed line of the ground truth. In this way they can control the accuracy of the watershed post processing.

create ground truth contours from the individual binary masks, then combine them in a single image channel. You can then use these as ground truth along with binary masks to teach U-Net. To output a second channel just increase the number of kernels in the final U-Net layer from 1 to 2, and make sure your inputs are consistent with this format.

contours是所有細胞核的輪廓嗎？它的寬度跟細胞核大小怎麼個相關法？具體怎麼得到。A better approach is to use labels, dilation, watershed with watershed_line=True etc. Watershed line will be a border between the nuclei.

觀察結果發現，網路很容易的預測出non ambiguous地方的contours，因此我們決定僅僅預測細胞之間的邊界。這很容易的讓我們得到0.55分，只用了一個網路的結果+分水嶺後處理。
如果我們在一個通道有一個full mask，另一個通道有a border，有時候用於分水嶺方法的種子點不足夠好。一個更好的方法是change nuclei masks and make pixels empty on the borders. This also allows to use softmax as the target activation instead of sigmoid. 這可以較好的分離細胞核，但是actually decreases MaP because of high thresholds for IoU. 為解決這個問題，我們train additional networks trained on full masks and combined the results in the postprocessing step.
最後的方法：
2 channels full masks i.e. (mask, border)
2 channels masks for networks with sigmoid activation i.e. (mask - border, border) or 3 channels masks for networks with softmax activation i.e. (mask - border, border , 1 - mask - border)

網路架構

使用Unet型別的編解碼結構，編碼部分在ImageNet上預訓練。令人驚訝的是，類似於VGG16這樣的簡單編碼器在這個資料集上不work，這讓我們決定go deeper，As a result - top performing encoders in this competition were : DPN-92, Resnet-152, InceptionResnetV2, Resnet101.

2nd Level model / Postprocessing

在預測出來的nucleus candidates上面訓練LightGBM模型。。。看不懂啥意思
lgb just trained as regressor to predict IoU. So, we have nucleus candidate after Neural Networks prediction. We extracting some features and predicting IoU.

在公開LB上，這個帶自適應閾值和FP rejection的後處理提升了~15。

整合

後處理前平均所有預測出來的mask。

訓練引數

RGB channels
Random Crops: 256x256
Random scale: [0.55, 1.45]
Batch Size: 16
optimizer: Adam
learning rate: initial 1e-4 with decay (we had different LR policies, but mostly small LR no more than 1e-4)
preprocessing: same as on ImageNet depending on network

Test Time Augmentations (TTA)

Standard Flips/Rotations(0,90,180,270).

討論區集錦

是否用dropout

普通的dropout通常對卷積層是有害的，但從經驗來講，SpatialDropout2D可以用於分割任務中並得到好一些的結果，I used SpatialDropout2D just before the classification layer。

如何讓編解碼過程中維數匹配

I simply changed encoders and replaced valid padding with same padding. 常用的解碼器有兩種：1） U-Net like with standard upsampling-conv approach 2） a custom FPN like decoder 。兩者的效能相當

Applying Deep Watershed Transform to Kaggle Data Science Bowl 2018
分水嶺方法主要圍繞著以下三個問題

i. 如何選擇種子點

ii. 如何決定分水嶺邊界

iii. 如何決定地形高度

Deep Watershed Transform（DWT）方法幫助我們解決了上述一些問題。主要的思想是讓CNN學習兩件事情：unit vectors pointing to (against) boundary 和 energy (mountain height) levels。實踐中如果僅僅用分水嶺，結果很可能會過分割（因為有噪聲干擾，所以建立的landscape會有很多區域性極小），這就是DWT的思想，讓CNN學習mountain landscape。原始作者用2個VGG-like CNN來學習

The watershed energy itself (energy, elevation - whatever you may call it);
The unit vectors points to (away from) the borders - to help CNN learn the boundaries);

實踐中不一定要用一個分離的CNN，僅僅用Unet並且 feed several masks to it（Unet）:

The merged nuclei masks (gt_mask);
Several thinned / eroded merged nuclei masks (3 levels of eroson - 1 pixel, 3 pixels, 5 or 7 pixels);
Centers of the nuclei (did not help in my case);
Unit vectors (helped a little bit, locally);
Borders (helped a bit, locally);

對我來說，最好的後處理方法 (energy_baseline function) 很簡單:

Sum the predicted mask and 3 levels of eroded masks;
Apply a threshold of 0.4 to produce watershed seeds;
Flood the original thresholded masks from these markers;
Use distance transform as "landscape height" measure;

個人看來，我想把解決細胞核分割問題的方法分成4類：
i. Unet-like 架構+基於分水嶺變換的後處理。
ii. Recurrent架構。I found only this more or less relevant paper (even followed by code release);
iii. 基於proposal的模型，比如Mask-RCNN。

2. 第三名解決方案（Mask-RCNN，0.614）

3. 第四名解決方案（Unet，0.610）

我們隊伍評測了Unet和Mask-RCNN，結果是Unet效果顯著的好於mask-Rcnn。
受Deep Watershed Transform paper啟發，對每個pixel，我們預測了the x,y components of vector pointing from the instance border和mask，watershed levels and nuclei centers using the second connected UNet.

For unet encoders we used Conv2d - BN - Relu - Conv2d - Relu Decoders: Upsample/concatenate - Conv2d - Relu

4. 第五名解決方案（Mask-RCNN，0.609）

Mask-RCNN keras, tensorflow 程式碼

二、優秀教程

Keras U-Net starter tutorial by Kjetil Amdal-Saevik with 739 upvotes

Get the data

首先匯入所有影象和對應的mask，下采樣訓練集影象和測試集影象來保持things light and manageable. 但是我們應該對測試集的原始影象大小記錄下來，從而將predicted masks上取樣到原始大小以及建立正確的run-length encoding。

There are definitely better ways to handle this, but it works fine for now! 更好的處理方法有哪些呢？？

搭建並訓練神經網路

搭建Unet，如果那小資料實驗的話建議使用checkpointing and early stopping.

進行預測

在訓練集、驗證集（合理性檢查）以及測試集上做預測。注意：如果使用了early stopping 和checkpointing，記得載入儲存的效能最好的模型。

編碼結果並提交

最終的結果是0.277 LB

Teaching notebook for total imaging newbies tutorial by Stephen Bailey with 354 upvotes
新手入門教程，教如何分析影象，並不是要得高分。

A. 處理顏色

資料集中的影象可能是RGB、RGBA以及灰度影象。為了簡化處理，通過rgb2gray函式都轉到灰度影象。

B. 去除背景

最簡單的假設是影象中只用前景和背景兩類，它們灰度服從bimodal distribution。如果能發現最好的分離閾值，我們可以mask out背景資料，僅僅處理剩下的前景目標。可以嘗試採用大津閾值方法，因為它將影象建模為一個bimodal分佈並且找到最優的分離值。

thresh_val = threshold_otsu(im_gray)
mask = np.where(im_gray > thresh_val, 1, 0)

C. Deriving individual masks for each object

我們需要對每個細胞核得到separate mask，遍歷mask中所有的目標，通過ndimage.label給每個目標一個label。

from scipy import ndimage
labels, nlabels = ndimage.label(mask)
label_array = []
for label_num in range(1, nlabels+1):
     label_mask = np.where(labels== label_num, 1, 0)
     label_arrays.append(label_mask)

print('共計 {} 個分離的components 被檢測到'.format(nlabels))

這樣簡單的處理有兩個很直接的問題

有一些畫素點是孤立的（右上角）
細胞重疊（中間靠右）
[圖片上傳失敗...(image-42cef7-1530348310622)]

利用ndimage.find_objects, 可以迭代的遍歷mask，zooming in單個的細胞核並施加額外的後處理。

for label_ind, label_coords in enumerate(ndimage.find_objects(labels)):
    cell = im_gray[label_coords]
    
    # Check if the label size is too small
    if np.product(cell.shape) < 10: 
        print('Label {} is too small! Setting to 0.'.format(label_ind))
        mask = np.where(labels==label_ind+1, 0, mask)

對於adjacent 細胞(不同於重疊細胞)，One thing we can do here is to see whether we can shrink the mask to "open up" the differences between the cells. This is called mask erosion. We can then re-dilate it to to recover the original proportions.

# Get the object indices, and perform a binary opening procedure
two_cell_indices = ndimage.find_objects(labels)[1]
cell_mask = mask[two_cell_indices]
cell_mask_opened = ndimage.binary_opening(cell_mask, iterations=8)

本次比賽的目標是細胞核檢測，因此需要提交單個細胞核的結果。

Step-By-Step Explanation of Scoring Metric

Nuclei overview to submission tutorial by Kevin Mader with 240 upvotes

A. the preprocessing steps to load the data

B. a quick visualization of the color-space

C. training a simple CNN

搭建一個簡單的CNN網路，用擴張卷積

from keras.models import Sequential
from keras.layers import BatchNormalization, Conv2D, UpSampling2D, Lambda
simple_cnn = Sequential()
simple_cnn.add(BatchNormalization(input_shape = (None, None, IMG_CHANNELS), 
                                  name = 'NormalizeInput'))
simple_cnn.add(Conv2D(8, kernel_size = (3,3), padding = 'same'))
simple_cnn.add(Conv2D(8, kernel_size = (3,3), padding = 'same'))
# use dilations to get a slightly larger field of view
simple_cnn.add(Conv2D(16, kernel_size = (3,3), dilation_rate = 2, padding = 'same'))
simple_cnn.add(Conv2D(16, kernel_size = (3,3), dilation_rate = 2, padding = 'same'))
simple_cnn.add(Conv2D(32, kernel_size = (3,3), dilation_rate = 3, padding = 'same'))

# the final processing
simple_cnn.add(Conv2D(16, kernel_size = (1,1), padding = 'same'))
simple_cnn.add(Conv2D(1, kernel_size = (1,1), padding = 'same', activation = 'sigmoid'))
simple_cnn.summary()

Loss function：Dice

from keras import backend as K
smooth = 1.
def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def dice_coef_loss(y_true, y_pred):
    return -dice_coef(y_true, y_pred)
simple_cnn.compile(optimizer = 'adam', 
                   loss = dice_coef_loss, 
                   metrics = [dice_coef, 'acc', 'mse'])

訓練網路

def simple_gen():
    while True:
        for _, c_row in train_img_df.iterrows():
            yield np.expand_dims(c_row['images'],0), np.expand_dims(np.expand_dims(c_row['masks'],-1),0)

simple_cnn.fit_generator(simple_gen(), 
                         steps_per_epoch=train_img_df.shape[0],
                        epochs = 3)

D. applying the model to the test data

用RCNN分割的Keras 教程

forked from Nuclei Overview to Submission by Kevin Mader (+0/–0)

優質討論集錦

UNet Vs Mask-rcnn, which is better?
Unet!
mask-rcnn in pytorch from scratch
是否需要padding？

原始Unet作者使用VALID padding，不是SAME;
比賽第一名Selim Seferbekov：認為VALID要比SAME padding弱很多。實際上，Unet的貢獻不是no padding和cropping，而是skip connection。所以我認為VALID僅僅是作者的preference，不是thorough study的結果。使用zero padding有如下好處：保留空間解析度，更容易的讓編解碼架構使用，可以被用於很深的網路以及提升了borders分割的效能。cs231n 助教Karpathy講義中也提到，如果每次卷積都執行VALID padding，volume的大小會減小，進而導致邊界處的資訊被“washed away”的很快。
Karphathy在twitter上也說到："Zero padding in ConvNets is highly suspicious/wrong. Input distribution stats are off on each border differently yet params are all shared"。Reflection or periodic or symetric padding has the benefit to not introduce new freqencies in the image signal. （Unet作者就是用的reflection 預處理）

其它

Heng CherKeng 分享了很多有價值的insight
Waleed 寫了非常好的mask_rnn 程式碼
MPWare 隊
第41名Unet程式碼

原始Unet作者使用VALID padding，不是SAME; 使用加權的cross entropy。對細胞間small gaps增加更多的權重。我們在預處理階段對細胞做了腐蝕，確保每個細胞是分離開的，然後對最終的結果做膨脹。使用scale-augmentation使得網路對各種大小的細胞都有預測能力。沒有任何整合，最終得分是0.504.
kaggle discussion

Kaggle求生：亞馬遜熱帶雨林篇

作者：EdwardMa
連結：https://www.jianshu.com/p/4b6ccb52af2e
來源：簡書
簡書著作權歸作者所有，任何形式的轉載都請聯絡作者獲得授權並註明出處。