1. 程式人生 > >基於深度學習的目標檢測DET

基於深度學習的目標檢測DET

   SSD: Single Shot MultiBox Detector, 是一個end to end 的目標檢測識別模型。先小八卦下,它屬於google派系,它的作者也是googlenet的作者。該模型旨在高精度的快速識別, 它不用額外計算bounding box而能達到相當的識別精度,而且速度有極大的提高,號稱可以達到58的FPS 和 72.1%的mAP。

  我們先來看下這個模型的全貌。它的最底幾層是一個經典的VGG16的網路(也可以替換成ResNet),其中的卷積層conv4_3和全連線層fc7、 以及再往上的三個卷積層conv6、conv7、conv8,分別分支出mbox_conf, mbox_loc, priorbox三種節點(稱X節點),然後通過對應的concat將來自不同層的X節點進行融合, 最後將concat結果輸出一併進行分類決策。


    更詳細的,可以看下面的一次前向計算的程式碼輸出。

[INFO 2016-08-30 21:58:39.619143 21429 net.cpp:540] Forwarding data
[INFO 2016-08-30 21:58:39.622481 21429 net.cpp:540] Forwarding data_data_0_split
[INFO 2016-08-30 21:58:39.622514 21429 net.cpp:540] Forwarding conv1_1
[INFO 2016-08-30 21:58:39.627096 21429 net.cpp:540] Forwarding relu1_1
[INFO 2016-08-30 21:58:39.627473 21429 net.cpp:540] Forwarding conv1_2
[INFO 2016-08-30 21:58:39.631721 21429 net.cpp:540] Forwarding relu1_2
[INFO 2016-08-30 21:58:39.631757 21429 net.cpp:540] Forwarding pool1
[INFO 2016-08-30 21:58:39.632096 21429 net.cpp:540] Forwarding conv2_1
[INFO 2016-08-30 21:58:39.634774 21429 net.cpp:540] Forwarding relu2_1
[INFO 2016-08-30 21:58:39.634809 21429 net.cpp:540] Forwarding conv2_2
[INFO 2016-08-30 21:58:39.639045 21429 net.cpp:540] Forwarding relu2_2
[INFO 2016-08-30 21:58:39.639080 21429 net.cpp:540] Forwarding pool2
[INFO 2016-08-30 21:58:39.639394 21429 net.cpp:540] Forwarding conv3_1
[INFO 2016-08-30 21:58:39.642501 21429 net.cpp:540] Forwarding relu3_1
[INFO 2016-08-30 21:58:39.642535 21429 net.cpp:540] Forwarding conv3_2
[INFO 2016-08-30 21:58:39.647202 21429 net.cpp:540] Forwarding relu3_2
[INFO 2016-08-30 21:58:39.647235 21429 net.cpp:540] Forwarding conv3_3
[INFO 2016-08-30 21:58:39.650738 21429 net.cpp:540] Forwarding relu3_3
[INFO 2016-08-30 21:58:39.650770 21429 net.cpp:540] Forwarding pool3
[INFO 2016-08-30 21:58:39.651074 21429 net.cpp:540] Forwarding conv4_1
[INFO 2016-08-30 21:58:39.655285 21429 net.cpp:540] Forwarding relu4_1
[INFO 2016-08-30 21:58:39.655323 21429 net.cpp:540] Forwarding conv4_2
[INFO 2016-08-30 21:58:39.660395 21429 net.cpp:540] Forwarding relu4_2
[INFO 2016-08-30 21:58:39.660429 21429 net.cpp:540] Forwarding conv4_3
[INFO 2016-08-30 21:58:39.665523 21429 net.cpp:540] Forwarding relu4_3
[INFO 2016-08-30 21:58:39.665555 21429 net.cpp:540] Forwarding conv4_3_relu4_3_0_split
[INFO 2016-08-30 21:58:39.665570 21429 net.cpp:540] Forwarding pool4
[INFO 2016-08-30 21:58:39.665881 21429 net.cpp:540] Forwarding conv5_1
[INFO 2016-08-30 21:58:39.668714 21429 net.cpp:540] Forwarding relu5_1
[INFO 2016-08-30 21:58:39.668748 21429 net.cpp:540] Forwarding conv5_2
[INFO 2016-08-30 21:58:39.671761 21429 net.cpp:540] Forwarding relu5_2
[INFO 2016-08-30 21:58:39.671807 21429 net.cpp:540] Forwarding conv5_3
[INFO 2016-08-30 21:58:39.675269 21429 net.cpp:540] Forwarding relu5_3
[INFO 2016-08-30 21:58:39.675302 21429 net.cpp:540] Forwarding pool5
[INFO 2016-08-30 21:58:39.675624 21429 net.cpp:540] Forwarding fc6
[INFO 2016-08-30 21:58:39.685935 21429 net.cpp:540] Forwarding relu6
[INFO 2016-08-30 21:58:39.685971 21429 net.cpp:540] Forwarding fc7
[INFO 2016-08-30 21:58:39.688531 21429 net.cpp:540] Forwarding relu7
[INFO 2016-08-30 21:58:39.688565 21429 net.cpp:540] Forwarding fc7_relu7_0_split
[INFO 2016-08-30 21:58:39.688580 21429 net.cpp:540] Forwarding conv6_1
[INFO 2016-08-30 21:58:39.691439 21429 net.cpp:540] Forwarding conv6_1_relu
[INFO 2016-08-30 21:58:39.691473 21429 net.cpp:540] Forwarding conv6_2
[INFO 2016-08-30 21:58:39.695135 21429 net.cpp:540] Forwarding conv6_2_relu
[INFO 2016-08-30 21:58:39.695169 21429 net.cpp:540] Forwarding conv6_2_conv6_2_relu_0_split
[INFO 2016-08-30 21:58:39.695183 21429 net.cpp:540] Forwarding conv7_1
[INFO 2016-08-30 21:58:39.698765 21429 net.cpp:540] Forwarding conv7_1_relu
[INFO 2016-08-30 21:58:39.698796 21429 net.cpp:540] Forwarding conv7_2
[INFO 2016-08-30 21:58:39.701938 21429 net.cpp:540] Forwarding conv7_2_relu
[INFO 2016-08-30 21:58:39.702193 21429 net.cpp:540] Forwarding conv7_2_conv7_2_relu_0_split
[INFO 2016-08-30 21:58:39.702220 21429 net.cpp:540] Forwarding conv8_1
[INFO 2016-08-30 21:58:39.704677 21429 net.cpp:540] Forwarding conv8_1_relu
[INFO 2016-08-30 21:58:39.704716 21429 net.cpp:540] Forwarding conv8_2
[INFO 2016-08-30 21:58:39.707798 21429 net.cpp:540] Forwarding conv8_2_relu
[INFO 2016-08-30 21:58:39.707839 21429 net.cpp:540] Forwarding conv8_2_conv8_2_relu_0_split
[INFO 2016-08-30 21:58:39.707859 21429 net.cpp:540] Forwarding pool6
[INFO 2016-08-30 21:58:39.707926 21429 net.cpp:540] Forwarding pool6_pool6_0_split
[INFO 2016-08-30 21:58:39.707947 21429 net.cpp:540] Forwarding conv4_3_norm
[INFO 2016-08-30 21:58:39.711788 21429 net.cpp:540] Forwarding conv4_3_norm_conv4_3_norm_0_split
[INFO 2016-08-30 21:58:39.711818 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_loc
[INFO 2016-08-30 21:58:39.714972 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_loc_perm
[INFO 2016-08-30 21:58:39.717313 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_loc_flat
[INFO 2016-08-30 21:58:39.717339 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_conf
[INFO 2016-08-30 21:58:39.724395 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_conf_perm
[INFO 2016-08-30 21:58:39.731096 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_conf_flat
[INFO 2016-08-30 21:58:39.731127 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_priorbox
[INFO 2016-08-30 21:58:39.731290 21429 net.cpp:540] Forwarding fc7_mbox_loc
[INFO 2016-08-30 21:58:39.733963 21429 net.cpp:540] Forwarding fc7_mbox_loc_perm
[INFO 2016-08-30 21:58:39.737503 21429 net.cpp:540] Forwarding fc7_mbox_loc_flat
[INFO 2016-08-30 21:58:39.737527 21429 net.cpp:540] Forwarding fc7_mbox_conf
[INFO 2016-08-30 21:58:39.746902 21429 net.cpp:540] Forwarding fc7_mbox_conf_perm
[INFO 2016-08-30 21:58:39.750918 21429 net.cpp:540] Forwarding fc7_mbox_conf_flat
[INFO 2016-08-30 21:58:39.750946 21429 net.cpp:540] Forwarding fc7_mbox_priorbox
[INFO 2016-08-30 21:58:39.751056 21429 net.cpp:540] Forwarding conv6_2_mbox_loc
[INFO 2016-08-30 21:58:39.753976 21429 net.cpp:540] Forwarding conv6_2_mbox_loc_perm
[INFO 2016-08-30 21:58:39.756206 21429 net.cpp:540] Forwarding conv6_2_mbox_loc_flat
[INFO 2016-08-30 21:58:39.756239 21429 net.cpp:540] Forwarding conv6_2_mbox_conf
[INFO 2016-08-30 21:58:39.763130 21429 net.cpp:540] Forwarding conv6_2_mbox_conf_perm
[INFO 2016-08-30 21:58:39.764664 21429 net.cpp:540] Forwarding conv6_2_mbox_conf_flat
[INFO 2016-08-30 21:58:39.764689 21429 net.cpp:540] Forwarding conv6_2_mbox_priorbox
[INFO 2016-08-30 21:58:39.764760 21429 net.cpp:540] Forwarding conv7_2_mbox_loc
[INFO 2016-08-30 21:58:39.768630 21429 net.cpp:540] Forwarding conv7_2_mbox_loc_perm
[INFO 2016-08-30 21:58:39.772903 21429 net.cpp:540] Forwarding conv7_2_mbox_loc_flat
[INFO 2016-08-30 21:58:39.772927 21429 net.cpp:540] Forwarding conv7_2_mbox_conf
[INFO 2016-08-30 21:58:39.777669 21429 net.cpp:540] Forwarding conv7_2_mbox_conf_perm
[INFO 2016-08-30 21:58:39.781180 21429 net.cpp:540] Forwarding conv7_2_mbox_conf_flat
[INFO 2016-08-30 21:58:39.781205 21429 net.cpp:540] Forwarding conv7_2_mbox_priorbox
[INFO 2016-08-30 21:58:39.781263 21429 net.cpp:540] Forwarding conv8_2_mbox_loc
[INFO 2016-08-30 21:58:39.783634 21429 net.cpp:540] Forwarding conv8_2_mbox_loc_perm
[INFO 2016-08-30 21:58:39.788920 21429 net.cpp:540] Forwarding conv8_2_mbox_loc_flat
[INFO 2016-08-30 21:58:39.788944 21429 net.cpp:540] Forwarding conv8_2_mbox_conf
[INFO 2016-08-30 21:58:39.793294 21429 net.cpp:540] Forwarding conv8_2_mbox_conf_perm
[INFO 2016-08-30 21:58:39.797371 21429 net.cpp:540] Forwarding conv8_2_mbox_conf_flat
[INFO 2016-08-30 21:58:39.797397 21429 net.cpp:540] Forwarding conv8_2_mbox_priorbox
[INFO 2016-08-30 21:58:39.797449 21429 net.cpp:540] Forwarding pool6_mbox_loc
[INFO 2016-08-30 21:58:39.800542 21429 net.cpp:540] Forwarding pool6_mbox_loc_perm
[INFO 2016-08-30 21:58:39.804468 21429 net.cpp:540] Forwarding pool6_mbox_loc_flat
[INFO 2016-08-30 21:58:39.804493 21429 net.cpp:540] Forwarding pool6_mbox_conf
[INFO 2016-08-30 21:58:39.808717 21429 net.cpp:540] Forwarding pool6_mbox_conf_perm
[INFO 2016-08-30 21:58:39.812292 21429 net.cpp:540] Forwarding pool6_mbox_conf_flat
[INFO 2016-08-30 21:58:39.812317 21429 net.cpp:540] Forwarding pool6_mbox_priorbox
[INFO 2016-08-30 21:58:39.812382 21429 net.cpp:540] Forwarding mbox_loc
[INFO 2016-08-30 21:58:39.812604 21429 net.cpp:540] Forwarding mbox_conf
[INFO 2016-08-30 21:58:39.812834 21429 net.cpp:540] Forwarding mbox_priorbox
[INFO 2016-08-30 21:58:39.819844 21429 net.cpp:540] Forwarding mbox_conf_reshape
[INFO 2016-08-30 21:58:39.819871 21429 net.cpp:540] Forwarding mbox_conf_softmax
[INFO 2016-08-30 21:58:39.820596 21429 net.cpp:540] Forwarding mbox_conf_flatten
[INFO 2016-08-30 21:58:39.820647 21429 net.cpp:540] Forwarding detection_out
[INFO 2016-08-30 21:58:39.832866 21429 net.cpp:540] Forwarding detection_eval

 SSD 網路使用了大量的小的卷積核(1x1, 3x3),不僅用於分類而且用於bounding box的位置迴歸,通過一些濾波實現不同長寬比的目標檢測,並進而用於在後續的不同feature map下的多尺度的檢測。

  SSD設計了一個bounding box集合, 包含4個:長的 、寬的、大正方、小正方,分佈在不同尺寸(4x4,8x8)的feature map的每個位置,  即用卷積的方式覆蓋了一個m*n*p的feature map的m*n個位置。在訓練時,對這些box與groundtruth box進行匹配,即對每個box計算和groundtruth的位移和分類概率,獲得了4個位移值和c個分類概率值,並根據groundtruth的類別獲得TP和FP,最終通過計算加權位置損失和分類置信度損失獲得模型整體損失, 並通過非極大值抑制來獲得最終的檢測結果。

  不同形狀的box,及其在多解析度feature map下的應用,實現了box的引數空間的離散化從而提高計算效率。groudtruth的資訊, 包括類別和位置都需要明確地附給那些網路輸出,使得損失函式和反向傳播是end to end的。在訓練時,需要將groundtruh和box對應起來,只要和groundtruth的jaccard覆蓋率大於0.5,就能和該groundtruth對應上,每個groundtruh必須至少有一個box與其對應。另外,當候選box數量很多時,往往FP也很多,導致TP和FP的數量不平衡。於是,根據分類置信度對候選box進行排序,取top個候選使得FP和TP的比例在3:1。

  關於如何識別多尺度目標。我們知道,低層的feature map對影象的細節表達出色從而可以提高語義分割質量,高層的feature map可以平滑分割結果。於是,綜合底層和高層的feature map進行檢測。不同層的feature map有不同的感受野尺寸,這個很關鍵,可以參考Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene cnns. In: ICLR. (2015)。然而不需要給某一層feature map構建不同尺寸的box, 而是某層的feature map只學習檢測某個尺度的物件,所以某一層的feature map只有一個尺度的box。舉個例子,在8x8的feature map中的box是無法檢測到尺寸較大的狗的(如下圖)。從低層到高層,box的縮放比均勻地分佈在0.2~0.95之間。進一步為了解決長寬比的問題,每層的box又生成了{1,1+,2,3,1/2,1/3} 6個不同長寬比的擴充套件box。

  

  SSD從某種意義上是結合了RPN和YOLO的思想。即

1)RPN的anchor思想,在feature map上運用256 個 3x3 的濾波器,事實上是在feature map的每個位置,從256個維度來表達9種anchor box特徵。濾波器滑動窗的位置提供了相對原圖的定位資訊。迴歸框提供了相對該滑動窗的更精細的定位資訊。RPN使得計算降低256倍(即從基於原圖的操作轉為基於特徵圖的操作)。

 2)YOLO的迴歸思想,即用特徵迴歸出目標的位置和了類別, 而沒有使用ROI pooling進行分類和提取。

相關推薦

no