SSD( Single Shot MultiBox Detector)關鍵原始碼解析

阿新 • • 發佈：2019-01-08

SSD（SSD: Single Shot MultiBox Detector）是採用單個深度神經網路模型實現目標檢測和識別的方法。如圖0-1所示，該方法是綜合了Faster R-CNN的anchor box和YOLO單個神經網路檢測思路（YOLOv2也採用了類似的思路，詳見YOLO升級版：YOLOv2和YOLO9000解析），既有Faster R-CNN的準確率又有YOLO的檢測速度，可以實現高準確率實時檢測。在300*300解析度，SSD在VOC2007資料集上準確率為74.3%mAP，59FPS；512*512解析度，SSD獲得了超過Fast R-CNN，獲得了80%mAP/19fps的結果，如圖0-2所示。SSD關鍵點分為兩類：模型結構和訓練方法。模型結構包括：多尺度特徵圖檢測網路結構和anchor boxes生成；訓練方法包括：ground truth預處理和損失函式。本文解析的是SSD的tensorflow實現原始碼，來源

balancap/SSD-Tensorflow。本文結構如下：

1，多尺度特徵圖檢測網路結構；

2，anchor boxes生成；

3，ground truth預處理；

4，目標函式；

5，總結

圖0-1 SSD與MultiBox，Faster R-CNN，YOLO原理（此圖來源於作者在eccv2016的PPT）

圖0-2 SSD檢測速度與精確度。（此圖來源於作者在eccv2016的PPT）

1 多尺度特徵圖檢測網路結構

SSD的網路模型如圖1-1所示。<img src="https://pic1.zhimg.com/v2-7f7f3c99d20df97455e8bcfce7876d30_b.png" data-rawwidth="1152" data-rawheight="553" class="origin_image zh-lightbox-thumb" width="1152" data-original="https://pic1.zhimg.com/v2-7f7f3c99d20df97455e8bcfce7876d30_r.png">

圖1-1 SSD模型結構。（此圖來源於原論文）

模型建立原始碼包含於ssd_vgg_300.py中。模型多尺度特徵圖檢測如圖1-2所示。模型選擇的特徵圖包括：38×38（block4）,19×19（block7），10×10（block8），5×5（block9），3×3（block10），1×1（block11）。對於每張特徵圖，生成採用3×3卷積生成預設框的四個偏移位置和21個類別的置信度。比如block7，預設框（def boxes）數目為6，每個預設框包含4個偏移位置和21個類別置信度（4+21）。因此，block7的最後輸出為(19*19)*6*(4+21)。

圖1-2 多尺度特徵取樣（此圖來源：知乎專欄）

其中，初始化引數如下：

    """
    Implementation of the SSD VGG-based 300 network.

    The default features layers with 300x300 image input are:
      conv4 ==> 38 x 38
      conv7 ==> 19 x 19
      conv8 ==> 10 x 10
      conv9 ==> 5 x 5
      conv10 ==> 3 x 3
      conv11 ==> 1 x 1
    The default image size used to train this network is 300x300.
    """
    default_params = SSDParams(
        img_shape=(300, 300),#輸入尺寸
        num_classes=21,#預測類別20+1=21（20類加背景）
        #獲取feature map層
        feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'],
        feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],
        
        anchor_size_bounds=[0.15, 0.90],
        #anchor boxes的大小
        anchor_sizes=[(21., 45.),
                      (45., 99.),
                      (99., 153.),
                      (153., 207.),
                      (207., 261.),
                      (261., 315.)],
        #anchor boxes的aspect ratios
        anchor_ratios=[[2, .5],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5],
                       [2, .5]],
        anchor_steps=[8, 16, 32, 64, 100, 300],#anchor的層
        anchor_offset=0.5,#補償閥值0.5
        normalizations=[20, -1, -1, -1, -1, -1],#該特徵層是否正則，大於零即正則；小於零則否
        prior_scaling=[0.1, 0.1, 0.2, 0.2]
        )

建立模型程式碼如下，作者採用了TensorFlow-Slim（類似於keras的高層庫）來建立網路模型，詳細內容可以參考TensorFlow-Slim網頁。

#建立ssd網路函式
def ssd_net(inputs,
            num_classes=21,
            feat_layers=SSDNet.default_params.feat_layers,
            anchor_sizes=SSDNet.default_params.anchor_sizes,
            anchor_ratios=SSDNet.default_params.anchor_ratios,
            normalizations=SSDNet.default_params.normalizations,
            is_training=True,
            dropout_keep_prob=0.5,
            prediction_fn=slim.softmax,
            reuse=None,
            scope='ssd_300_vgg'):
    """SSD net definition.
    """
    # End_points collect relevant activations for external use.
    #用於收集每一層輸出結果
    end_points = {}
    #採用slim建立vgg網路,網路結構參考文章內的結構圖
    with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):
        # Original VGG-16 blocks.
        net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
        end_points['block1'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool1')
        # Block 2.
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
        end_points['block2'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool2')
        # Block 3.
        net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
        end_points['block3'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool3')
        # Block 4.
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
        end_points['block4'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool4')
        # Block 5.
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
        end_points['block5'] = net
        net = slim.max_pool2d(net, [3, 3], 1, scope='pool5')#max pool

        #外加的SSD層
        # Additional SSD blocks.
        # Block 6: let's dilate the hell out of it!
        #輸出shape為19×19×1024
        net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')
        end_points['block6'] = net
        # Block 7: 1x1 conv. Because the fuck.
        #卷積核為1×1
        net = slim.conv2d(net, 1024, [1, 1], scope='conv7')
        end_points['block7'] = net

        # Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).
        end_point = 'block8'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')
            net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3')
        end_points[end_point] = net
        end_point = 'block9'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3')
        end_points[end_point] = net
        end_point = 'block10'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        end_point = 'block11'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
        end_points[end_point] = net

        # Prediction and localisations layers.
        #預測和定位
        predictions = []
        logits = []
        localisations = []
        for i, layer in enumerate(feat_layers):
            with tf.variable_scope(layer + '_box'):
                #接受特徵層的輸出，生成類別和位置預測
                p, l = ssd_multibox_layer(end_points[layer],
                                          num_classes,
                                          anchor_sizes[i],
                                          anchor_ratios[i],
                                          normalizations[i])
            #把每一層的預測收集
            predictions.append(prediction_fn(p))#prediction_fn為softmax，預測類別
            logits.append(p)#概率
            localisations.append(l)#預測位置資訊

        return predictions, localisations, logits, end_points

2 anchor box生成

對每一張特徵圖，按照不同的大小（scale）和長寬比（ratio）生成生成k個預設框（default boxes），原理圖如圖2-1所示(此圖中，預設框數目k=6，其中5×5的紅色點代表特徵圖，因此：5*5*6 = 150 個boxes)。

每個預設框大小計算公式為： $s_{k}=s_{min} +\frac{s_{max}-s_{min} }{m-1}(k-1),k\in [1,m]$ ，其中，m為特徵圖數目， $s_{min}$ 為最底層特徵圖大小（原論文中值為0.2，程式碼中為0.15）， $s_{max}$ 為最頂層特徵圖預設框大小（原論文中為0.9,程式碼中為0.9）。

每個預設框長寬比根據比例值計算，原論文中比例值為 $\left\{ 1,2,3,1/2,1/3 \right\}$ ，因此，每個預設框的寬為 $w_{k}^{a} =s_{k}\sqrt{a_{r} }$ ，高為 $h_{k}^{a} =s_{k}/\sqrt{a_{r} }$ 。對於比例為1的預設框，額外新增一個比例為 $s_{k}^{'} =\sqrt{s_{k}s_{k+1}}$ 的預設框。最終，每張特徵圖中的每個點生成6個預設框。每個預設框中心設定為 $(\frac{i+0.5}{|f_{k} |},\frac{j+0.5}{|f_{k} |} )$ ,其中， $\left| f_{k} \right|$ 為第k個特徵圖尺寸。

圖2-1 anchor box生成示意圖（此圖來源於知乎專欄）

原始碼中，預設框生成函式為ssd_anchor_one_layer()，程式碼如下：

#生成一層的anchor boxes
def ssd_anchor_one_layer(img_shape,#原始影象shape
                         feat_shape,#特徵圖shape
                         sizes,#預設的box size
                         ratios,#aspect 比例
                         step,#anchor的層
                         offset=0.5,
                         dtype=np.float32):
    """Computer SSD default anchor boxes for one feature layer.

    Determine the relative position grid of the centers, and the relative
    width and height.

    Arguments:
      feat_shape: Feature shape, used for computing relative position grids;
      size: Absolute reference sizes;
      ratios: Ratios to use on these features;
      img_shape: Image shape, used for computing height, width relatively to the
        former;
      offset: Grid offset.

    Return:
      y, x, h, w: Relative x and y grids, and height and width.
    """
    # Compute the position grid: simple way.
    # y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
    # y = (y.astype(dtype) + offset) / feat_shape[0]
    # x = (x.astype(dtype) + offset) / feat_shape[1]
    # Weird SSD-Caffe computation using steps values...
    
    """
    #測試中，引數如下
    feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)]
    anchor_sizes=[(21., 45.),
                      (45., 99.),
                      (99., 153.),
                      (153., 207.),
                      (207., 261.),
                      (261., 315.)]
    anchor_ratios=[[2, .5],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5],
                       [2, .5]]
    anchor_steps=[8, 16, 32, 64, 100, 300]


    offset=0.5

    dtype=np.float32

    feat_shape=feat_shapes[0]
    step=anchor_steps[0]
    """
    #測試中，y和x的shape為（38,38）（38,38）
    #y的值為
    #array([[ 0,  0,  0, ...,  0,  0,  0],
     #  [ 1,  1,  1, ...,  1,  1,  1],
    # [ 2,  2,  2, ...,  2,  2,  2],
    #   ..., 
     #  [35, 35, 35, ..., 35, 35, 35],
    #  [36, 36, 36, ..., 36, 36, 36],
     #  [37, 37, 37, ..., 37, 37, 37]])
    y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
    #測試中y=(y+0.5)×8/300,x=(x+0.5)×8/300
    y = (y.astype(dtype) + offset) * step / img_shape[0]
    x = (x.astype(dtype) + offset) * step / img_shape[1]

    #擴充套件維度，維度為（38,38,1）
    # Expand dims to support easy broadcasting.
    y = np.expand_dims(y, axis=-1)
    x = np.expand_dims(x, axis=-1)

    # Compute relative height and width.
    # Tries to follow the original implementation of SSD for the order.
    #數值為2+2
    num_anchors = len(sizes) + len(ratios)
    #shape為（4,）
    h = np.zeros((num_anchors, ), dtype=dtype)
    w = np.zeros((num_anchors, ), dtype=dtype)
    # Add first anchor boxes with ratio=1.
    #測試中，h[0]=21/300,w[0]=21/300?
    h[0] = sizes[0] / img_shape[0]
    w[0] = sizes[0] / img_shape[1]
    di = 1
    if len(sizes) > 1:
        #h[1]=sqrt(21*45)/300
        h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
        w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
        di += 1
    for i, r in enumerate(ratios):
        h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)
        w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)
    #測試中，y和x shape為（38,38,1）
    #h和w的shape為（4,）
    return y, x, h, w

3 ground truth預處理

訓練過程中，首先需要將label資訊（ground truth box，ground truth category）進行預處理，將其對應到相應的預設框上。根據預設框和ground truth box的jaccard 重疊來尋找對應的預設框。文章中選取了jaccard重疊超過0.5的預設框為正樣本，其它為負樣本。

原始碼ground truth預處理程式碼位於ssd_common.py檔案中，關鍵程式碼如下：

#label和bbox編碼函式
def tf_ssd_bboxes_encode_layer(labels,#ground truth標籤，1D tensor
                               bboxes,#N×4 Tensor（float）
                               anchors_layer,#anchors，為list
                               matching_threshold=0.5,#閥值
                               prior_scaling=[0.1, 0.1, 0.2, 0.2],#縮放
                               dtype=tf.float32):
    """Encode groundtruth labels and bounding boxes using SSD anchors from
    one layer.

    Arguments:
      labels: 1D Tensor(int64) containing groundtruth labels;
      bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
      anchors_layer: Numpy array with layer anchors;
      matching_threshold: Threshold for positive match with groundtruth bboxes;
      prior_scaling: Scaling of encoded coordinates.

    Return:
      (target_labels, target_localizations, target_scores): Target Tensors.
    """
    # Anchors coordinates and volume.
    #獲取anchors層
    yref, xref, href, wref = anchors_layer
    ymin = yref - href / 2.
    xmin = xref - wref / 2.
    ymax = yref + href / 2.
    xmax = xref + wref / 2.
    #xmax的shape為((38, 38, 1), (38, 38, 1), (4,), (4,))
(38, 38, 4)
    #體積
    vol_anchors = (xmax - xmin) * (ymax - ymin)

    # Initialize tensors...
    shape = (yref.shape[0], yref.shape[1], href.size)
    feat_labels = tf.zeros(shape, dtype=tf.int64)
    feat_scores = tf.zeros(shape, dtype=dtype)
    #shape為（38,38,4）
    feat_ymin = tf.zeros(shape, dtype=dtype)
    feat_xmin = tf.zeros(shape, dtype=dtype)
    feat_ymax = tf.ones(shape, dtype=dtype)
    feat_xmax = tf.ones(shape, dtype=dtype)

    #計算jaccard重合
    def jaccard_with_anchors(bbox):
        """Compute jaccard score a box and the anchors.
        """
        # Intersection bbox and volume.
        int_ymin = tf.maximum(ymin, bbox[0])
        int_xmin = tf.maximum(xmin, bbox[1])
        int_ymax = tf.minimum(ymax, bbox[2])
        int_xmax = tf.minimum(xmax, bbox[3])
        h = tf.maximum(int_ymax - int_ymin, 0.)
        w = tf.maximum(int_xmax - int_xmin, 0.)

        # Volumes.
        inter_vol = h * w
        union_vol = vol_anchors - inter_vol \
            + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
        jaccard = tf.div(inter_vol, union_vol)
        return jaccard
    #條件函式 
    def condition(i, feat_labels, feat_scores,
                  feat_ymin, feat_xmin, feat_ymax, feat_xmax):
        """Condition: check label index.
        """
        #tf.less函式 Returns the truth value of (x < y) element-wise.
        r = tf.less(i, tf.shape(labels))
        return r[0]
    #主體
    def body(i, feat_labels, feat_scores,
             feat_ymin, feat_xmin, feat_ymax, feat_xmax):
        """Body: update feature labels, scores and bboxes.
        Follow the original SSD paper for that purpose:
          - assign values when jaccard > 0.5;
          - only update if beat the score of other bboxes.
        """
        # Jaccard score.
        label = labels[i]
        bbox = bboxes[i]
        scores = jaccard_with_anchors(bbox)#計算jaccard重合值

        # 'Boolean' mask.
        #tf.greater函式返回大於的布林值
        mask = tf.logical_and(tf.greater(scores, matching_threshold),
                              tf.greater(scores, feat_scores))
        imask = tf.cast(mask, tf.int64)
        fmask = tf.cast(mask, dtype)
        # Update values using mask.
        feat_labels = imask * label + (1 - imask) * feat_labels
        feat_scores = tf.select(mask, scores, feat_scores)

        feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
        feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
        feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
        feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
        return [i+1, feat_labels, feat_scores,
                feat_ymin, feat_xmin, feat_ymax, feat_xmax]
    # Main loop definition.
    i = 0
    [i, feat_labels, feat_scores,
     feat_ymin, feat_xmin,
     feat_ymax, feat_xmax] = tf.while_loo

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    SSD( Single Shot MultiBox Detector)關鍵原始碼解析
      SSD（SSD: Single Shot MultiBox Detector）是採用單個深度神經網路模型實現目標檢測和識別的方法。如圖0-1所示，該方法是綜合了Faster
 R-CNN的anchor box和YOLO單個神經網路檢測思路（YOLOv2也採用了類似的思路，詳見YOLO升級版：YOLOv2和YO 

  
 

    

    
    深度學習【50】物體檢測：SSD: Single Shot MultiBox Detector論文翻譯
      
							
							
							SSD在眾多的物體檢測方法中算是比較重要的。之前學習過，但是沒過多久就忘了，因此決定將該論文翻譯一下，以加深印象。



Abstract

我們提出了用單個深度神經網路進行物體檢測的方法，稱為SSD。在每個特徵圖中的每個位置，SSD將bbox（bounding 

  
 

    

    
    SSD(Single Shot MultiBox Detector):create_list.sh io.cpp:187 Could not open or find file
      
							
							
							今天在為SSD訓練自己的資料時執行caff/data/VOC0712/create_list.sh時報了好多這個錯誤：


  E0412 16:28:31.653440  5008 io.cpp:187] Could not open or find file 

  
 

    

    
    SSD: Single Shot MultiBox Detector 深度學習筆記之SSD物體檢測模型
      
								
								            
						
                

演算法概述

本文提出的SSD演算法是一種直接預測目標類別和bounding box的多目標檢測演算法。

與faster rcnn相比，該演算法沒有生成 proposal 的過程，這就極大提高了檢 

  
 

    

    
    SSD(Single Shot MultiBox Detector)的solver引數 test_initialization的說明塈解決訓練時一直停在Iteration 0的問題
      
							
							
							前陣子訓練過一次SSD模型，訓練後發現數據集有問題，修改了資料集後，今天準備再做一次SSD訓練時，如下執行訓練程式碼：



python ./examples/ssd/ssd_pascal.py 

到了開始迭代時，一直停在Iteration 0,進行不下去。 

  
 

    

    
    《SSD: Single Shot MultiBox Detector》論文筆記
       
 
  
  
 1. 論文思想 
 SSD從網路中直接預測目標的類別與不同長寬比例的邊界框。在這篇論文中提出的方法（SSD）並沒有為邊界框假設重取樣畫素或是特徵，但是卻達到了使用這種方案檢測模型的精度。在VOC 2007的測試集上跑到了mAP74.3% 59 FPS（在後來改進資料增廣的方法，在VOC  

  
 

    

    
    SSD: Single Shot MultiBox Detector翻譯（包括正式版和預印版）（對原文作部分理解性修改）
      



預印版表7



表7：Pascal VOC2007 test上的結果。SSD300是唯一的可以實現超過70%mAP的實時檢測方法。通過使用大輸入影象，在保持接近實時速度的同時，SSD512在精度上優於所有方法。


4、相關工作

        目前有兩種已建立的用於影象中物件檢測的方法，一種基於 

  
 

    

    
    SSD:(Single Shot MultiBox Detector)
      
							
							
							這兩天把SSD論文讀了一下，SSD也是一個端到端的目標檢測模型，SSD在檢測的準確率和速度上相對於YOLO有了很大的提高，並且在檢測小目標上也有不俗的效果。
特點
1. 使用多尺度特徵圖進行預測
    大多數目標檢測演算法都是使用最後一層特徵圖進行目標位置和類 

  
 

    

    
    [論文閱讀]SSD Single Shot Multibox Detector
       
 
  
  
 SSD Single Shot Multibox Detector 
 Code: https://github.com/balancap/SSD-Tensorflow 
  
  SSD 是ECCV 2016的文章，文章主要提出了一種新的framework來完成object detec 

  
 

    

    
    【深度學習：目標檢測】RCNN學習筆記(10)：SSD:Single Shot MultiBox Detector
      

之前一直想總結下SSD，奈何時間緣故一直沒有整理，在我的認知當中，SSD是對Faster RCNN RPN這一獨特步驟的延伸與整合。總而言之，在思考於RPN進行2-class分類的時候，能否借鑑YOLO並簡化faster rcnn在21分類同時整合faster rcnn中anchor boxes實現m 

  
 

    

    
    深度學習系列之SSD(Single Shot MultiBox Detector) 個人總結
      
							
							
							
Introduction


SSD模型在保證精度的前提下，速度還特別快，可以做到real time。其中原因在於ssd消除了object proposal這個環節。Faster R-CNN是先利用RPN產生object proposal，然後對proposa 

  
 

    

    
    SSD:Single Shot MultiBox Detector 論文筆記
      資料增廣（Data augmentation）對於結果的提升非常明顯 
Fast R-CNN 與 Faster R-CNN 使用原始影象，以及 0.5 的概率對原始影象進行水平翻轉（horizontal flip），進行訓練。如上面寫的，本文還使用了額外的 sampling 策略，YOLO 中還使用了 亮度 

  
 

    

    
    論文筆記 | SSD: Single Shot MultiBox Detector
      
							
							
							

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg 
 
Wei Liu



Abstract 

  
 

    

    
    SSD: Single Shot MultiBox Detector in TensorFlow(翻譯)
      
							
							
							



一、環境配置



基本環境：Windows 10 + GTX950M



1、安裝Anaconda3()







注意：必須下載Anaconda3，因為Anaconda3對應Python3.x,而Windows下Tensorflow只支援Pyt 

  
 

    

    
    基於 SSD: Single Shot MultiBox Detector 的人體上下半身檢測
      
							
							
							基於 SSD 的人體上下半身檢測

這裡主要是通過將訓練資料轉換成 Pascal VOC 資料集格式來實現 SSD 檢測人體上下半身. 

由於沒有對人體上下半身進行標註的資料集, 這裡利用 MPII Human Pose Dataset 來將 Pose 資料轉 

  
 

    

    
    論文閱讀：SSD: Single Shot MultiBox Detector
      
							
							
							Preface



有幾點更新： 
1. 看到一篇 blog 對檢測做了一個總結、收集，強烈推薦： Object Detection 
2. 還有，今天在微博上看到 VOC2012 的榜單又被重新整理了，微博原地址為：這裡，如下圖： 
 
3. 目前 voc  

  
 

    

    
    SSD: Single Shot MultiBox Detector 檢測單張圖片
      
							
							
							前言 

博主也算是剛開始研究SSD專案，之前寫了一篇SSD:Single Shot MultiBox Detector的安裝配置和執行，這次是簡單介紹下如何用SSD檢測單張圖片，其實過程也比較簡單，下面正式開始。

準備工作 

當然，首先你要把SSD按照教程 

  
 

    

    
    目標檢測之SSD(single shot multibox detector)的pytorch程式碼閱讀總結
      confidence：文章中說 根據highest confidence loss，選擇3倍於正樣本數目的負樣本，正樣本根據重合度已經選擇出來了，選擇負樣本先計算這個confidence loss，首先求取預測confidence的log_sum_exp值，再減去其中對應groundtruth的confide 

  
 

    

    
    論文閱讀筆記：SSD: Single Shot MultiBox Detector
      1 介紹當前目標檢測系統都是下列方法的變體：假定邊界框（hypothesizebounding boxes），對每個方框進行重取樣畫素或者特徵，應用一個高質量的分類器。這種流程在檢測基準（detectionbenchmarks）上盛行，因為選擇性搜尋在PASCAL VOC,COCO和ILSVRC檢測上的效果最 

  
 

    

    
    SSD: Single Shot MultiBox Detector 訓練KITTI資料集（1）
      
							
							
							前言 

之前介紹了SSD的基本用法和檢測單張圖片的方法，那麼本篇部落格將詳細記錄如何使用SSD檢測框架訓練KITTI資料集。SSD專案中自帶了用於訓練PASCAL VOC資料集的指令碼，基本不用做修改就可以輕鬆完成訓練；但是想要訓練其他資料集比如KITTI，則