1. 程式人生 > >tensorflow+faster rcnn程式碼理解(一):構建vgg前端和RPN網路

tensorflow+faster rcnn程式碼理解(一):構建vgg前端和RPN網路

0.前言

該程式碼執行首先就是呼叫vgg類建立一個網路物件self.net

if cfg.FLAGS.network == 'vgg16':
    self.net = vgg16(batch_size=cfg.FLAGS.ims_per_batch)

該類位於vgg.py中,如下:

class vgg16(Network):
    def __init__(self, batch_size=1):
        Network.__init__(self, batch_size=batch_size)

 可以看到該類是繼承於network類的。也就是vgg類建立的物件擁有network類的變數,同時又有自己新增的變數。我們在來看network類,位於network.py中。可以看到該類含有的變數就是訓練一個網路所需要基本的變量了。

class Network(object):
    def __init__(self, batch_size=1):
        self._feat_stride = [16, ]
        self._feat_compress = [1. / 16., ]
        self._batch_size = batch_size
        self._predictions = {}
        self._losses = {}
        self._anchor_targets = {}
        self._proposal_targets = {}
        self._layers = {}
        self._act_summaries = []
        self._score_summaries = {}
        self._train_summaries = []
        self._event_summaries = {}
        self._variables_to_fix = {}

下面程式碼都以輸入影象為600×800舉例。 

1.構建vgg16的前端(build_head函式)

vgg16的網路模型圖如下,程式碼就是完成紅框的部分。

    def build_head(self, is_training):

        # Main network
        # Layer  1
        net = slim.repeat(self._image, 2, slim.conv2d, 64, [3, 3], trainable=False, scope='conv1')  #224×224×64
        net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool1') #112×112×64

        # Layer 2
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], trainable=False, scope='conv2')  #112×112×128
        net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool2') #56×56×128

        # Layer 3
        net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], trainable=is_training, scope='conv3') #56×56×256
        net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool3')  #28×28×256

        # Layer 4
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv4')#28×28×512
        net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool4')  #14×14×512

        # Layer 5
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv5') #14×14×512

        # Append network to summaries
        self._act_summaries.append(net)

        # Append network as head layer
        self._layers['head'] = net

        return net

2.構建RPN網路(build_rpn函式)

我將build_rpn函式的內容拆成兩部分來寫,首先是生成anchor部分

2.1  生成anchor

def build_rpn(self, net, is_training, initializer):

        # Build anchor component     呼叫network.py中的函式建立anchor的構成,主要有anchor_scale和anchor_ratio兩個引數修改
        self._anchor_component()
    #anchor的構成
    def _anchor_component(self):
        with tf.variable_scope('ANCHOR_' + 'default'):
            # just to get the shape right 這裡feat_stride = 16,因為此時對於vgg模型來說新增RPN的時候,得到的特徵度是經過4次pool的,也就是下采樣了16倍
            height = tf.to_int32(tf.ceil(self._im_info[0, 0] / np.float32(self._feat_stride[0]))) #下采樣後特徵圖的高度,這裡為38(600/16)
            width = tf.to_int32(tf.ceil(self._im_info[0, 1] / np.float32(self._feat_stride[0])))  #下采樣後特徵圖的寬度,這裡為50(800/16)
            anchors, anchor_length = tf.py_func(generate_anchors_pre, #anchor_length是anchor的數量
                                                [height, width,
                                                 self._feat_stride, self._anchor_scales, self._anchor_ratios],
                                                [tf.float32, tf.int32], name="generate_anchors")  #呼叫snippets.py中的generate_anchors_pre函式產生anchor
            anchors.set_shape([None, 4])
            anchor_length.set_shape([])
            self._anchors = anchors
            self._anchor_length = anchor_length

在此基礎上進而呼叫generate_anchors_pre函式生成anchor,位於snippets.py

def generate_anchors_pre(height, width, feat_stride, anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
    """ A wrapper function to generate anchors given different scales
      Also return the number of anchors in variable 'length'
    """
    anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales))
    #anchors = generate_anchors() #採用generate_anchors預設的形參
    #pdb.set_trace()
    A = anchors.shape[0]
    shift_x = np.arange(0, width) * feat_stride  #對應到原圖上產生anchor的中心點的位置
    shift_y = np.arange(0, height) * feat_stride
    shift_x, shift_y = np.meshgrid(shift_x, shift_y)
    #shifts形成了(Xmin,Ymin,Xmax,Ymax)的形式,但是由於相當於枚舉了achor的中心點,所以Xmin=Xmax,Ymin=Ymax,並且acnhor是按照一行一行排列的
    shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel())).transpose()
    K = shifts.shape[0] #應該生成的anchor點的數量
    # width changes faster, so here it is H, W, C

    anchors = anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2))
    anchors = anchors.reshape((K * A, 4)).astype(np.float32, copy=False)
    length = np.int32(anchors.shape[0])

    return anchors, length

 生成anchor的的過程是(在fb的detectron框架中生成方式與這個也一樣,見部落格 detectron程式碼理解(六):對輸入樣本如何產生anchor):

(1)首先對一個cell生程anchor,此時這個anchor沒有位置點資訊,只有長寬而已,這個長寬滿足我們設計的anchor_scales和anchor_ratios,對於generate_anchors的解釋可以看:detectron程式碼理解(四):generate_anchors,生成完畢後A就是這個cell上anchor的數量,這裡為9,因為是3個anchor_scales和3個anchor_ratios的尺度的相乘的結果。

(2)之後根據我們輸入圖片的長寬,以及feat_stride,計算在這樣一張圖片上以stride為步長要在哪些位置生成anchor,此時才有了放置這些anchor的點shifts

(3)有了這些放置點的位置,將(1)步驟的anchor挪過去,就相當於在每一個位置生成了包含9中形態的anchors。

最後生成的anchor個數為38×50×9 = 17100個anchor

2.2 構建RPN層

  def build_rpn(self, net, is_training, initializer):

        #生成anchor(程式碼在前)
        #建立RPN層,利用3×3×512來實現
        rpn = slim.conv2d(net, 512, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3")

        self._act_summaries.append(rpn)   
        #rpn_cls_score.shape = (1,38,50,18)  每個anchor是二分類 這個H和W是最後一張特徵圖的大小(這裡假設原圖是600×800,經過16倍的下采樣後成為38×50)
        rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')
        
        # Change it so that the score has 2 as its channel size
        # 過程:1.首先將rpn_cls_score變為to_caffe的形式,從 (1,38,50,18)變為(1,18,38,50)
        # 2.再將to_caffe變為([self._batch_size], [num_dim, -1], [input_shape[2]]),其中num_dim 為下面輸入引數2,input_shape[2]是rpn_cls_score的50
        #  to_caffe = (1,2,9×38,50)= (1,2,342,50)
        # 3.最後將上面的第二維度放到最後,就改變為(1,342,50,2)
        rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')  #rpn_cls_score_reshape.shape = (1,342,50,2)
        
        #經過上面之後rpn_cls_score_reshape = (1,342,50,2)
        #將rpn_cls_score_reshape變為(1×342×50,2)即(17100 2),再增加softmax
        #最後再reshape成(1,342,50,2)的大小,所以rpn_cls_prob_reshape的大小為(1,342,50,2)
        rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape") 
                
        rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")  #rpn_cls_prob.shape = (1,38,50,18)
        
        rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
        
        return rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape

最後RPN層的輸出為: 

  • rpn_cls_prob
  • rpn_bbox_pred
  • rpn_cls_score
  • rpn_cls_score_reshape