Yolov2原始碼解析（二）

阿新 • • 發佈：2018-12-30

一、資料集製作

首先是從官網上下載VOC2012資料集，這裡我個人得到是訓練集檔案:VOCtrainval_11-May-2012，為了減輕訓練開銷，我將驗證集作為測試集，通過將Main資料夾下的val.txt改名為test.txt檔案，將資料集製作成hdf5檔案的形式。

import os
import h5py
import numpy as np
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ElementTree


classes = [
    "aeroplane","bicycle","bird","boat","bottle","bus","car","cat",
    "chair","cow","diningtable", "dog","horse","motorbike","person",
    "pottedplant","sheep","sofa","train","tvmonitor"
]

sets_from_2012 = [('2012', 'train'), ('2012', 'test')]
train_set = [('2012', 'train')]
test_set = [('2012', 'test')]

# 讀取xml檔案內的box資訊, 即(class,x,y,h,w)
def get_boxes_for_id(voc_path, year, image_id, height, width):
    fname = os.path.join(voc_path, 'VOCdevkit/VOC{}/Annotations/{}.xml'.format(year, image_id))
    
    with open(fname) as in_file:
        xml_tree = ElementTree.parse(in_file)
    
    root = xml_tree.getroot()
    boxes = []
    
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        label = obj.find('name').text
        if label not in classes or int(difficult) == 1:
            continue
        xml_box = obj.find('bndbox')
        bbox = (classes.index(label), float(xml_box.find('xmin').text) / width,
                float(xml_box.find('ymin').text) / height, float(xml_box.find('xmax').text) / width,
                float(xml_box.find('ymax').text) / height)
        boxes.extend(bbox)
    return np.array(boxes)
# test
'''
voc_path = 'C:/Users/guesthost/Desktop/VOCtrainval_11-May-2012/VOCdevkit/'
year = '2012'
image_id = '2007_000123'
boxes = get_boxes_for_id(voc_path, year, image_id)
'''

def get_image_for_id(voc_path, year, image_id):
    fname = os.path.join(voc_path, 'VOCdevkit/VOC{}/JPEGImages/{}.jpg'.format(year, image_id))
    
    im = plt.imread(fname)
    height = im.shape[0]
    width = im.shape[1]
    return np.ndarray.flatten(im, 'C'), height, width

# test
'''
voc_path = 'C:/Users/guesthost/Desktop/VOCtrainval_11-May-2012/VOCdevkit/'
year = '2012'
image_id = '2007_000123'
boxes = get_image_for_id(voc_path, year, image_id)
'''

# 獲取txt檔案內的圖片名稱
def get_ids(voc_path, datasets):
    ids = []
    for year, image_set in datasets:
        id_file = os.path.join(voc_path, 'VOCdevkit/VOC{}/ImageSets/Main/{}.txt'.format(
            year, image_set))
        with open(id_file, 'r') as image_ids:
            ids.extend(map(str.strip, image_ids.readlines()))
    return ids
# test
# voc_path = 'C:/Users/guesthost/Desktop/VOCtrainval_11-May-2012/'
# ids = get_ids(voc_path, train_set)

def add_to_dataset(voc_path, year, ids, images, boxes, image_size, start = 0):
    for i, voc_id in enumerate(ids):
        image_data, height, width = get_image_for_id(voc_path, year, voc_id)
        image_boxes = get_boxes_for_id(voc_path, year, voc_id, height, width)
        
        
        images[start + i] = image_data
        boxes[start + i] = image_boxes
        image_size[start + i] = [height, width]

def main(voc_path):
    # 讀取訓練集,驗證集,測試集序號
    train_ids = get_ids(voc_path, train_set)
    test_ids = get_ids(voc_path, test_set)
    
    # 建立hdf5資料結構
    print('Creating HDF5 dataset structure.')
    fname = os.path.join(voc_path, 'pascal_voc_2012.hdf5')
    voc_h5file = h5py.File(fname, 'w')
    uint8_dt = h5py.special_dtype(
        vlen = np.dtype('uint8'))
    vlen_int_dt = h5py.special_dtype(
        vlen=np.dtype(int))
    vlen_float = h5py.special_dtype(
            vlen=np.dtype(float))
    train_group = voc_h5file.create_group('train')
    test_group = voc_h5file.create_group('test')
    voc_h5file.attrs['classes'] = np.string_(str.join(',', classes))
    
    # 儲存影象的矩陣資訊
    # store images as variable length uint8 arrays
    train_images = train_group.create_dataset(
        'images', shape=(len(train_ids), ), dtype=uint8_dt)
    test_images = test_group.create_dataset(
        'images', shape=(len(test_ids), ), dtype=uint8_dt)
    
    # 儲存影象的class,xmin,ymin,xmax,ymax資訊
    train_boxes = train_group.create_dataset(
        'boxes', shape=(len(train_ids), ), dtype=vlen_float)
    test_boxes = test_group.create_dataset(
        'boxes', shape=(len(test_ids), ), dtype=vlen_float)
    
    # 儲存影象的height,width
    train_size = train_group.create_dataset(
        'size', shape=(len(train_ids), ), dtype=vlen_int_dt)
    test_size = test_group.create_dataset(
        'size', shape = (len(test_ids), ), dtype=vlen_int_dt)
    
    print('Processing Pascal VOC 2012 datasets for training set.')
    add_to_dataset(voc_path, '2012', train_ids, train_images,
                               train_boxes, train_size)
    print('Processing Pascal VOC 2012 test set.')
    add_to_dataset(voc_path, '2012', test_ids, test_images, test_boxes, test_size)
    print('Closing HDF5 file.')
    voc_h5file.close()
    print('Done.')
    
if __name__ == '__main__':
    voc_path = 'C:/Users/guesthost/Desktop/VOCtrainval_11-May-2012/'
    main(voc_path)

這裡將voc_path資料夾下的影象資訊和boxes資訊記錄在h5檔案中，最後生成pascal_voc_2012.hdf5檔案。

二、訓練

首先是讀取資料集，由於之前製作資料集使用的是h5檔案格式。

def load_data(data_path):
    file = h5py.File(data_path)
    # file = h5py.File('C:\\Users\\guesthost\\Desktop\\VOCtrainval_11-May-2012\\pascal_voc_2012.hdf5')
    train_data = file['train']
    # 這裡將原來的資料集恢復原本的尺寸

    train_size = train_data['size']
    train_boxes = train_data['boxes']
    train_images = process_images(train_size)
    class_names = get_classes('C:\\Users\\guesthost\\Desktop\\zc\\yoloV2\\voc_classes.txt')
    anchors = get_anchors('C:\\Users\\guesthost\\Desktop\\zc\\yoloV2\\voc_anchor_box.txt')
    return train_images, train_boxes, class_names, anchors

由於在資料集上我們對影象資料進行flatten，所以這裡需要根據記錄影象的尺寸對矩陣進行恢復：

def process_images(train_images,train_size):
    images = []
    for i, size in enumerate(train_size):
        height = size[0]
        width = size[1]
        image = np.reshape(train_images[i], (height, width, 3))
        images.append(image)
    return images

之後對影象矩陣資訊進行處理，包括有調整影象大小，資料型別轉化，標準化。針對boxes，這裡將座標資訊轉化成相對於整張圖片的比值。同時記錄影象最大目標數，將其他小於等於box的填充至最大。

def process_data(images, boxes = None):
    '''
    處理影象和目標ground box的資訊
    '''
    images = [PIL.Image.fromarray(i) for i in images]
    orig_size = np.array([images[0].width, images[0].height])
    orig_size = np.expand_dims(orig_size, axis = 0)
    
    # 處理影象矩陣資訊
    processed_images = [i.resize((416, 416), PIL.Image.BICUBIC) for i in images]
    processed_images = [np.array(image, dtype = np.float) for image in processed_images]
    processed_images = [image / 255. for image in processed_images]
    
    
    if boxes is not None:
        # boxes = [class, x_min, y_min, x_max, y_max]
        boxes = [box.reshape((-1, 5)) for box in boxes]
        boxes_xy = [0.5 * (box[:, 3:5] + box[:, 1:3]) for box in boxes]
        boxes_wh = [box[:, 3:5] - box[:, 1:3] for box in boxes]
        boxes_xy = [boxxy / orig_size for boxxy in boxes_xy]
        boxes_wh = [boxwh / orig_size for boxwh in boxes_wh]
        
        boxes = [np.concatenate((boxes_xy[i], boxes_wh[i], box[:, 0:1]), axis = 1) for i, box in enumerate(boxes)]
        
        # 為了方便訓練，對於某些圖片內檢測物體的box數量不一致情況，通過對影象
        max_boxes = 0
        for boxz in boxes:
            if boxz.shape[0] > max_boxes:
                max_boxes = boxz.shape[0]
                
        for i, boxz in enumerate(boxes):
            if boxz.shape[0]  < max_boxes:
                zero_padding = np.zeros((max_boxes - boxz.shape[0], 5), dtype = np.float32)
                boxes[i] = np.vstack((boxz, zero_padding))
                
        return np.array(processed_images), np.array(boxes)
    else:
        return np.array(processed_images)

獲取訓練所需的一些引數資訊

def get_detector_mask(boxes, anchors):
    detectors_mask = [0 for i in range(len(boxes))]
    matching_true_boxes = [0 for i in range(len(boxes))]
    for i, box in enumerate(boxes):
        detectors_mask[i], matching_true_boxes[i] = preprocess_true_boxes(box, anchors, [416, 416])
    return np.array(detectors_mask), np.array(matching_true_boxes)

def get_classes(classes_path):
    '''載入類別資訊'''
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names

def get_anchors(anchors_path):
    '''載入anchor box'''
    with open(anchors_path) as f:
        anchors = f.readline()
        anchors = [float(x) for x in anchors.split(',')]
        anchors = np.array(anchors)
        return np.reshape(anchors, (-1, 2))

再我們定義模型，這裡返回兩個模型，其中yolo_model最後返回DarkNet-19網路輸出的特徵向量，model返回帶loss函式的模型。

def create_model(anchors, class_names, load_pretrained = True):
    detectors_mask_shape = (13, 13, 5, 1)
    matching_boxes_shape = (13, 13, 5, 5)
    image_input = Input(shape = (416, 416, 3))
    boxes_input = Input(shape = (None, 5))
    
    detectors_mask_input = Input(shape = detectors_mask_shape)
    matching_boxes_input = Input(shape = matching_boxes_shape)

    yolo_model = yolo_body(image_input, len(anchors), len(class_names))

    if load_pretrained:
        yolo_path = os.path.join('model_data', 'yolo.h5')
        model_body = load_model(yolo_path)
        model_body = Model(model_body.inputs, model_body.layers[-2].output)

    with tf.device('/cpu:0'):
        model_loss = Lambda(
            yolo_loss,
            output_shape=(1, ),
            name='yolo_loss',
            arguments={'anchors': anchors,
                       'num_classes': len(class_names)})([
                           yolo_model.output, boxes_input,
                           detectors_mask_input, matching_boxes_input
                       ])
    model = Model(
        [yolo_model.input, boxes_input, detectors_mask_input,
         matching_boxes_input], model_loss)
    return yolo_model, model

最後定義訓練函式，具體的引數在程式碼中已給出

def train(model, class_names, anchors, image_data, boxes, detectors_mask, matching_true_boxes, validation_split = 0.1):
    model.compile(optimizer = 'adam',
                  loss = {'yolo_loss': lambda y_true, y_pred: y_pred})
    
    logging = TensorBoard()
    checkpoint = ModelCheckpoint('trained_stage_3_best.h5', monitor = 'val_loss',
                                 save_weights_only = True, save_best_only = True)
    early_stopping = EarlyStopping(monitor = 'val_loss', min_delta = 0, patience = 15, verbose = 1, mode = 'auto')
    # print(image_data.shape)
    model.fit([image_data, boxes, detectors_mask, matching_true_boxes],
              np.zeros(len(image_data)),
              validation_split = validation_split,
              batch_size = 32,
              epochs = 5,
              callbacks = [logging])
    model.save_weights('trained_stage_1.h5')
    
    model_body, model = create_model(anchors, class_names, load_pretrained = False)
    model.load_weights('trained_stage_1.h5')
    model.compile(
        optimizer='adam', loss={
            'yolo_loss': lambda y_true, y_pred: y_pred
        })
    
    model.fit([image_data, boxes, detectors_mask, matching_true_boxes],
              np.zeros(len(image_data)),
              validation_split = 0.1,
              batch_size = 8
              )
    model.save_weights('trained_stage_2.h5')
    model.fit([image_data, boxes, detectors_mask, matching_true_boxes],
              np.zeros(len(image_data)),
              validation_split = 0.1,
              batch_size = 8,
              epochs = 30,
              callbacks=[logging, checkpoint, early_stopping])
    model.save_weights('trained_stage_3.h5')

最終，結合（一）所說明的程式碼，我們定義整個訓練函式

def main():
    train_images, train_boxes, class_names, anchors = load_data('C:\\Users\\guesthost\\Desktop\\VOCtrainval_11-May-2012\\pascal_voc_2012.hdf5')
    print("Loading dataset successful")
    processed_images, processed_boxes = process_data(train_images, train_boxes)
    detectors_mask, matching_true_boxes = get_detector_mask(processed_boxes, anchors)
    print("Process dataset successful")
    yolo_model, model = create_model(anchors, class_names, False)
    train(model, class_names, anchors, processed_images, processed_boxes, detectors_mask, matching_true_boxes)

三、後記

我在GTX1080,16G的電腦上訓練，依然會出現記憶體不足的現象，由於個人能力欠缺，暫時還沒有什麼好的解決辦法，所以後續還是匯入別人訓練好的引數最好了。

Yolov2原始碼解析（二）

一、資料集製作首先是從官網上下載VOC2012資料集，這裡我個人得到是訓練集檔案:VOCtrainval_11-May-2012，為了減輕訓練開銷，我將驗證集作為測試集，通過將Main資料夾下的val.txt改名為test.txt檔案，將資料集製作成hdf5檔案的形式。 import os

Spring原始碼解析（二）——元件註冊2

import com.ken.service.BookService; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.

YOLOv2原始碼分析（二）

文章全部YOLOv2原始碼分析接著上一講沒有講完的make_convolutional_layer函式 0x01 make_convolutional_layer //make_convolutional_laye

認真的 Netty 原始碼解析（二）

Channel 的 register 操作經過前面的鋪墊，我們已經具備一定的基礎了，我們開始來把前面學到的內容揉在一起。這節，我們會介紹 register 操作，這一步其實是非常關鍵的，對於我們原始碼分析非常重要。 register 我們從 EchoClient 中的 connect() 方法出發，或者 E

jquery 1.7.2原始碼解析（二）構造jquery物件

構造jquery物件 jQuery物件是一個類陣列物件。一）建構函式jQuery() 建構函式的7種用法: 1.jQuery(selector [, context ]) 傳入字串引數：檢查該字串是選擇器表示式還是HTML程式碼。如果是選擇器表示式，則遍歷文件查詢匹配的DOM元

java集合原始碼解析（二）--AbstractCollection

今天帶來的是java單列頂層介面的第一個輕量級實現：AbstractCollection 我們直接進入正題，先來看看它的宣告： package java.util; //可以從名字上同樣看到 AbstractCollection 是一個抽象類，所以並不能例項化， //這個類只是作

EventBus原始碼解析（二）—釋出事件和登出流程

1.EventBus原始碼解析（一）—訂閱過程 2.EventBus原始碼解析（二）—釋出事件和登出流程前言上一篇部落格已經比較詳細的講解了EventBus的註冊過程，有了上一篇部落格的基礎，其實關於EventBus的原始碼中的其他流程就非常好理解了，尤其是我

Spring原始碼解析（二）：obtainFreshBeanFactory

spring的ApplicationContext容器的初始化流程主要由AbstractApplicationContext類中的refresh方法實現。而refresh()方法中獲取新工廠的主要是由obtainFreshBeanFactory()實現的，後續的操作均是beanFactoty的進一步處理。

Redis5.0原始碼解析（二）----------連結串列

基於Redis5.0 連結串列提供了高效的節點重排能力，以及順序性的節點訪問方式，並且可以通過增刪節點來靈活地調整連結串列的長度每個連結串列節點使用一個 adlist.h/listNode 結構來表示： //adlist.h - A generic do

ThreadPoolExecutor原始碼解析（二）

1.ThreadPoolExcuter執行例項首先我們先看如何新建一個ThreadPoolExecutor去執行執行緒。然後深入到原始碼中去看ThreadPoolExecutor裡面使如何運作的。 public class Test { public

redis原始碼解析（二）動態字串sds基本功能函式

1. 簡介本文繼上文基礎上，分析動態字串的功能函式，位於sds.c。由於函式較多，本篇介紹實現動態變化的基本增刪新建釋放函式。 2. 原始碼分析 sdsHdrSize()函式用於返回sdshdr的大小，主要使用sizeof()函式實現。 /*返回sds

OKHttp 3.10原始碼解析（二）：攔截器鏈

本篇文章我們主要來講解OKhttp的攔截器鏈，攔截器是OKhttp的主要特色之一，通過攔截器鏈，我們可以對request或response資料進行相關處理，我們也可以自定義攔截器interceptor。上一篇文章中我們講到，不管是OKhttp的同步請求還是非同步請求，都會呼叫RealCal

OkHttp原始碼解析（二）

上一篇講到OkHttp的整體流程，但是裡面有一個很重要的方法getResponseWithInterceptorChain還沒有講到，這個方法裡面包括了整個okHttp最重要的攔截器鏈，所以我們今天來講解一下。 Response getResponseWithI

Java容器——HashMap（Java8）原始碼解析（二）

在前文中介紹了HashMap中的重要元素，現在萬事俱備，需要刨根問底看看實現了。HashMap的增刪改查，都離不開元素查詢。查詢分兩部分，一是確定元素在table中的下標，二是確定以指定下標元素為首的具體位置。可以抽象理解為二維陣列，第一個通過雜湊函式得來，第二個下標則是連結串列或紅黑樹來得到，下面

RxJava2 原始碼解析（二）

概述知道源頭(Observable)是如何將資料傳送出去的。知道終點（Observer）是如何接收到資料的。何時將源頭和終點關聯起來的知道執行緒排程是怎麼實現的知道操作符是怎麼實現的本篇計劃講解一下4,5. RxJava最強大的莫過

Spark2.3.2原始碼解析： 8. RDD 原始碼解析（二） textFile 返回的RDD例項是什麼

本文主要目標是分析RDD的例項物件，到底放了什麼。從程式碼val textFile = sc.textFile(args(0)) 開始：直接看textFile 原始碼：你會發現呼叫的是hadoop的api，通過 hadoopFile 讀取資料，返回一個hadoop

DCGANs原始碼解析（二）

model.py DCGANs大部分都在一個叫做 DCGAN 的 Python 類（class）中（model.py）。像這樣把所有東西都放在一個類中非常有用，因為訓練後中間狀態可以被儲存起來，以便後面使用。首先讓我們定義生成器和鑑別器（上一篇已經介紹過了

mybatis通用mapper原始碼解析（二）

1.javabean的屬性值生成sql /** * 獲取所有查詢列，如id,name,code... * * @param entityClass * @return */ public static String getAllColumns(C

python原始碼解析（二）

一：PyObject 首先，先來看PyObject在object.h中的定義。typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobjec

antd原始碼解析（二）button控制元件的解析

第一節我們看了antd的button原始碼，現在我們用class的常用寫法改造下： import React,{ Component } from "React"; var _classnames2 = require('classnames');

Yolov2原始碼解析（二）

一、資料集製作

二、訓練

三、後記

相關推薦