1. 程式人生 > >faster rcnn 中pascal_voc.py

faster rcnn 中pascal_voc.py

該部分程式碼功能在於實現了一個pascol _voc的類,該類繼承自imdb,用於負責資料互動部分。

初始化函式

在初始化自身的同時,先呼叫了父類的初始化方法,將imdb _name傳入,例如(‘voc_2007_trainval’),下面是成員變數的初始化:

{
    year:’2007’
    image _set:’trainval’
    devkit_path:’data/VOCdevkit2007’
    data _path:’data/VOCdevkit2007/VOC2007’
    classes:(…)_如果想要訓練自己的資料,需要修改這裡_
    class _to _ind:{…} _一個將類名轉換成下標的字典 _
    image _ext:’.jpg’
    image _index:[‘000001’,’000003’,……]_根據trainval.txt獲取到的image索引_
    roidb _handler:<Method gt_roidb >
    salt:  <Object uuid >
    comp _id:’comp4’
    config:{…}
}
class pascal _voc(imdb):
  def __init__(self,image_set, year, devkit_path=None):
      imdb.__init__(self,'voc_' + year + '_' + image_set)
      self._year = year
      self._image_set =image_set
      self._devkit_path =self._get_default_path() if devkit_path is None 
                         else devkit_path
      self._data_path =os.path.join(self._devkit_path, 'VOC' + self._year)
      self._classes = ('__background__',# always index 0
                      'aeroplane', 'bicycle', 'bird', 'boat',
                      'bottle', 'bus', 'car', 'cat', 'chair',
                      'cow', 'diningtable', 'dog', 'horse',
                      'motorbike', 'person', 'pottedplant',
                      'sheep', 'sofa', 'train', 'tvmonitor')
      self._class_to_ind =dict(zip(self.classes, xrange(self.num_classes)))
      self._image_ext ='.jpg'
      self._image_index =self._load_image_set_index()
      # Default to roidb handler
      self._roidb_handler =self.selective_search_roidb
      self._salt =str(uuid.uuid4())
      self._comp_id ='comp4'
 
      # PASCAL specificconfig options
      self.config ={'cleanup'     : True,
                     'use_salt'    : True,
                    'use_diff'    : False,
                    'matlab_eval' : False,
                    'rpn_file'    : None,
                    'min_size'    : 2}
 
      assertos.path.exists(self._devkit_path), 
              'VOCdevkit path does not exist:{}'.format(self._devkit_path)
      assertos.path.exists(self._data_path), 
              'Path does notexist: {}'.format(self._data_path)

image_path _from _index

以下兩個函式非常容易理解,就是根據圖片的索引,比如‘000001’獲取在JPEGImages下對應的圖片路徑

def image_path_at(self, i):
        """
        Return the absolutepath to image i in the image sequence.
        """
        returnself.image_path_from_index(self._image_index[i])
 
    defimage_path_from_index(self, index):
        """
        Construct an imagepath from the image's "index" identifier.
        """
        image_path =os.path.join(self._data_path, 'JPEGImages',
                                 index + self._image_ext)
        assertos.path.exists(image_path), \
                'Path does not exist: {}'.format(image_path)
        return image_path
# load _image _set _index
# 該函式根據/VOCdevkit2007/VOC2007/ImageSets/Main/<image _set >.txt載入影象的索引
    def_load_image_set_index(self):
        """
        Load the indexeslisted in this dataset's image set file.
        """
        # Example path toimage set file:
        # self._devkit_path+ /VOCdevkit2007/VOC2007/ImageSets/Main/val.txt
        image_set_file =os.path.join(self._data_path, 'ImageSets', 'Main',
                                      self._image_set + '.txt')
        assertos.path.exists(image_set_file), \
                'Path doesnot exist: {}'.format(image_set_file)
        withopen(image_set_file) as f:
            image_index =[x.strip() for x in f.readlines()]
        return image_index

_get_default_path

返回預設的資料來源路徑,這裡是放在data下的VOCDevkit2007,如果有自己的資料集,修改該函式即可

 def_get_default_path(self):
        """
        Return the defaultpath where PASCAL VOC is expected to be installed.
        """
        returnos.path.join(cfg.DATA_DIR, 'VOCdevkit' + self._year)

gt_roidb

這個函式是該物件的核心函式之一,它將返回roidb資料物件。首先它會在cache路徑下找到以副檔名’.pkl’結尾的快取,這個檔案是通過cPickle工具將roidb序列化儲存的。如果該檔案存在,那麼它會先讀取這裡的內容,以提高效率(所以如果你換資料集的時候,要先把cache檔案給刪除,否則會造成錯誤)。接著,它將呼叫 _load _pascal _annotation這個私有函式載入roidb中的資料,並將其儲存在快取檔案中,返回roidb。roidb的格式可以參考下文 _load_pascal _annotation的註釋

def gt_roidb(self):
        """
        Return the databaseof ground-truth regions of interest.
 
        This functionloads/saves from/to a cache file to speed up future calls.
        """
        cache_file =os.path.join(self.cache_path, self.name + '_gt_roidb.pkl')
        ifos.path.exists(cache_file):
            withopen(cache_file, 'rb') as fid:
                roidb =cPickle.load(fid)
            print '{} gt roidbloaded from {}'.format(self.name, cache_file)
            return roidb
 
        gt_roidb =[self._load_pascal_annotation(index)
                    forindex in self.image_index]
        withopen(cache_file, 'wb') as fid:
            cPickle.dump(gt_roidb,fid, cPickle.HIGHEST_PROTOCOL)
        print 'wrote gtroidb to {}'.format(cache_file)
        return gt_roidb

selective_search _roidb

這個函式在fasterrcnn中似乎不怎麼用到,它也將返回roidb資料物件。首先它同樣會在cache路徑下找到以副檔名’.pkl’結尾的快取,如果該檔案存在,那麼它會先讀取這裡的內容,以提高效率(如果你換資料集的時候,要先把cache檔案給刪除,否則會造成錯誤)。接著,它將呼叫同時呼叫gt _roidb()和 _load _selective_search _roidb()獲取到兩組roidb,再通過merge_roidbs將其合併,最後寫入快取並返回。

def selective_search_roidb(self):
        """
        Return the databaseof selective search regions of interest.
        Ground-truth ROIs are also included.
 
        This functionloads/saves from/to a cache file to speed up future calls.
        """
        cache_file =os.path.join(self.cache_path,
                                 self.name + '_selective_search_roidb.pkl')
 
        ifos.path.exists(cache_file):
            withopen(cache_file, 'rb') as fid:
                roidb =cPickle.load(fid)
            print '{} ssroidb loaded from {}'.format(self.name, cache_file)
            return roidb
 
        if int(self._year)== 2007 or self._image_set != 'test':
            gt_roidb =self.gt_roidb()
            ss_roidb =self._load_selective_search_roidb(gt_roidb)
            roidb =imdb.merge_roidbs(gt_roidb, ss_roidb)
        else:
            roidb = self._load_selective_search_roidb(None)
        withopen(cache_file, 'wb') as fid:
           cPickle.dump(roidb, fid, cPickle.HIGHEST_PROTOCOL)
        print 'wrote ssroidb to {}'.format(cache_file)
        return roidb

_load_selective_search _roidb

selective _search的方法,fasterrcnn一般不使用,暫時可以忽略

def _load_selective_search_roidb(self, gt_roidb):
        filename =os.path.abspath(os.path.join(cfg.DATA_DIR,
                                               'selective_search_data',
                                                self.name + '.mat'))
        assertos.path.exists(filename), \
               'Selectivesearch data not found at: {}'.format(filename)
        raw_data =sio.loadmat(filename)['boxes'].ravel()
 
        box_list = []
        for i inxrange(raw_data.shape[0]):
            boxes =raw_data[i][:, (1, 0, 3, 2)] - 1
            keep =ds_utils.unique_boxes(boxes)
            boxes =boxes[keep, :]
            keep =ds_utils.filter_small_boxes(boxes, self.config['min_size'])
            boxes =boxes[keep, :]
           box_list.append(boxes)
        return self.create_roidb_from_box_list(box_list, gt_roidb)

_load_pascal _annotation

該函式根據每個影象的索引,到Annotations這個資料夾下去找相應的xml標註資料,然後載入所有的bounding box物件,並去除所有的“複雜”物件。

xml的解析到此結束,接下來是roidb中的幾個類成員的賦值:

-  boxes 一個二維陣列   每一行儲存xminymin xmax ymax

-  gt _classes儲存了每個box所對應的類索引(類陣列在初始化函式中宣告)

-  overlap是一個二維陣列,共有num _classes(即類的個數)行,每一行對應的box的類索引處值為1,其餘皆為0,後來被轉成了稀疏矩陣

-  seg _areas儲存著某個box的面積

-  flipped 為false代表該圖片還未被翻轉(後來在train.py裡會將翻轉的圖片加進去,用該變數用於區分)

最後將這些成員變數組裝成roidb返回

def _load_pascal_annotation(self, index):
        """
        Load image andbounding boxes info from XML file in the PASCAL VOC
        format.
        """
        filename =os.path.join(self._data_path, 'Annotations', index + '.xml')
        tree =ET.parse(filename)
        objs =tree.findall('object')
        if notself.config['use_diff']:
            # Exclude thesamples labeled as difficult
            non_diff_objs =[
                obj for objin objs if int(obj.find('difficult').text) == 0]
            # iflen(non_diff_objs) != len(objs):
            #     print 'Removed {} difficultobjects'.format(
            #         len(objs) - len(non_diff_objs))
            objs = non_diff_objs
        num_objs = len(objs)
 
        boxes =np.zeros((num_objs, 4), dtype=np.uint16)
        gt_classes =np.zeros((num_objs), dtype=np.int32)
        overlaps =np.zeros((num_objs, self.num_classes), dtype=np.float32)
        # "Seg"area for pascal is just the box area
        seg_areas =np.zeros((num_objs), dtype=np.float32)
 
        # Load objectbounding boxes into a data frame.
        for ix, obj inenumerate(objs):
            bbox =obj.find('bndbox')
            # Make pixelindexes 0-based
            x1 =float(bbox.find('xmin').text) - 1
            y1 =float(bbox.find('ymin').text) - 1
            x2 =float(bbox.find('xmax').text) - 1
            y2 =float(bbox.find('ymax').text) - 1
            cls =self._class_to_ind[obj.find('name').text.lower().strip()]
            boxes[ix, :] =[x1, y1, x2, y2]
            gt_classes[ix] =cls
            # 從anatation直接載入影象的資訊,因為本身就是ground-truth , 所以overlap直接設為1
            overlaps[ix,cls] = 1.0
            seg_areas[ix] =(x2 - x1 + 1) * (y2 - y1 + 1)
        # overlaps為 num_objs * K 的陣列, K表示總共的類別數, num_objs表示當前這張圖片中box的個數
        overlaps =scipy.sparse.csr_matrix(overlaps)
 
        return {'boxes' :boxes,
               'gt_classes': gt_classes,
                'gt_overlaps' : overlaps,
                'flipped' :False,
                'seg_areas': seg_areas}

test

以下一些函式是測試結果所用,閱讀價值不大,理解其功能即可

  def_write_voc_results_file(self, all_boxes):
  def _do_python_eval(self,output_dir = 'output'):
  def evaluate_detections(self,all_boxes, output_dir):

rpn_roidb

在經過RPN網路產生了proposal以後,這個函式作用是將這些proposal 的 roi與groudtruth結合起來,送入網路訓練。

那怎麼個結合法呢?proposal 的roidb格式與上面提到的gt_roidb一模一樣,只不過overlap由1變成了與最接近的class的重合度。

如何判斷是最接近的class呢?每個proposal的box都與groud-truth的box做一次重合度計算,與anchor_target _layer.py中類似

overlap = (重合部分面積) / (proposal _box面積 +gt_boxes面積 - 重合部分面積)

對於每個proposal,選出最大的那個gt_boxes的值,然後填到相應的class index下。

舉個例子:

classes: backgroud  cat  fish dog  car  bed
proposal1    0     0.65  0     0    0   0
proposal2    0       0   0    0.8   0    0

原來對應的class下的1 變成了overlap值罷了。最後用merge_roidbs將gr_roidb與rpn _roidb合併,輸出

 def rpn_roidb(self):
        if int(self._year)== 2007 or self._image_set != 'test':
            gt_roidb =self.gt_roidb()
            # 求取rpn_roidb需要以gt_roidb作為引數才能得到
            rpn_roidb =self._load_rpn_roidb(gt_roidb)
            roidb =imdb.merge_roidbs(gt_roidb, rpn_roidb)
        else:
            roidb =self._load_rpn_roidb(None)
        return roidb
 
    def_load_rpn_roidb(self, gt_roidb):#呼叫父類方法create_roidb_from_box_list從box_list 中讀取每張影象的boxes
        filename =self.config['rpn_file']
        print 'loading{}'.format(filename)
        assertos.path.exists(filename), \
               'rpn data notfound at: {}'.format(filename)
        with open(filename,'rb') as f:
            # 讀取rpn_file裡的box,形成box_list;box_list為一個列表,每張影象對應其中的一個元素,
            # 所以box_list 的大小要與gt_roidb 相同
            box_list =cPickle.load(f)
        return self.create_roidb_from_box_list(box_list, gt_roidb)

測試所用

if __name__ == '__main__':
  from datasets.pascal_vocimport pascal_voc
  d = pascal_voc('trainval','2007')
  res = d.roidb
  from IPython import embed;embed()