深度學習（三）——tiny YOLO演算法實現實時目標檢測（tensorflow實現）

阿新 • • 發佈：2018-12-14

一、背景介紹

YOLO演算法全稱You Only Look Once，是Joseph Redmon等人於15年3月發表的一篇文章。本實驗目標為實現YOLO演算法，借鑑了一部分材料，最終實現了輕量級的簡化版YOLO——tiny YOLO，其優勢在於實現簡單，目標檢測迅速。

[1]文章連結：https://arxiv.org/abs/1506.02640

[2]YOLO官網連結：https://pjreddie.com/darknet/yolo/

二、演算法原理簡述

相較於RCNN系列演算法，YOLO演算法最大的創新在於將物體檢測作為迴歸問題來求解，而RCNN系列演算法是將目標檢測用一個region proposal + CNN來作為分類問題求解。如下圖所示，YOLO通過對輸入影象進行推測，得到圖中所有物體的位置及其所屬類別的相應概率

YOLO的網路模型結構包含有24個卷積層和2個全連結層，其具體結構如下：

作者將YOLO演算法應用於了不同資料集，進行過演算法準確度的驗證，平均來看，YOLO的目標檢測準確度約為60%左右，這個精度已經算不錯了。同時，YOLO的識別速度可以達到45幀，改進版的fast YOLO可以達到155幀，下面是從官網獲取的關於COCO Dataset的模型應用結果統計：

從中可以看到， Tiny YOLO雖然準確度平均只有23.7%，但是其識別速度可以達到244幀。

下面再給出論文裡的模型識別結果圖，效果還是不錯的：

最後，附上幾個網上關於YOLO模型幾個比較好的解讀：

[3]YOLO_原理詳述

[4][目標檢測]YOLO原理

本文重點是實現簡化版的tiny YOLO模型，主要參考了程式碼：

[5]https://github.com/gliese581gg/YOLO_tensorflow

三、演算法實現

1.所用檔案

首先要介紹一下所有用到的檔案及其位置的安放。我的檔案具體包含：

-- test                         (測試影象資料夾) 
    |------ 000001.jpg          (測試檔案)
-- weights                      (權重資料夾)
    |------ YOLO_tiny.ckpt      (權值檔案)
-- main.py                      (執行檔案)

首先是test資料夾，裡面放置需要測試的jpg檔案就可以了。

其次是weights資料夾，裡面放置的是作者訓練好的ckpt檔案，該檔案的下載可以從google drive中下載：

不過從google drive中下載需要自己手動翻牆，而且下載速度會非常慢，我將該檔案傳到了自己的百度雲上，有需要的話可以自行下載：

連結：https://pan.baidu.com/s/1U-L-wpPZhzOW2yKmtgzwUQ

提取碼：8i3j

最後是main.py檔案，具體如何寫下面我會詳細介紹。

2.演算法實現

我的main.py檔案是參考了程式YOLO_tiny_tf.py，並加上了自己的一些改進實現的。先來看一下tiny YOLO的模型結構：

可以看到，tiny YOLO基本為VGG19模型的改進，然後將模型應用於影象中，對目標進行檢測，可以按照這個思路，編寫main.py檔案，具體程式碼為：

import numpy as np
import tensorflow as tf
import cv2
import time


class YOLO_TF:
    fromfile = 'test/person.jpg'
    tofile_img = 'test/output.jpg'
    tofile_txt = 'test/output.txt'
    imshow = True
    filewrite_img = False
    filewrite_txt = False
    disp_console = True
    weights_file = 'weights/YOLO_tiny.ckpt'
    alpha = 0.1
    threshold = 0.2
    iou_threshold = 0.5
    num_class = 20
    num_box = 2
    grid_size = 7
    classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
               "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

    w_img = 640
    h_img = 480

    def __init__(self, fromfile=None, tofile_img=None, tofile_txt=None):
        self.fromfile = fromfile

        self.tofile_img = tofile_img
        self.filewrite_img = True

        self.tofile_txt = tofile_txt
        self.filewrite_txt = True

        self.imshow = True
        self.disp_console = True

        self.build_networks()
        if self.fromfile is not None: self.detect_from_file(self.fromfile)

    def build_networks(self):
        if self.disp_console: print("Building YOLO_tiny graph...")
        self.x = tf.placeholder('float32', [None, 448, 448, 3])
        self.conv_1 = self.conv_layer(1, self.x, 16, 3, 1)
        self.pool_2 = self.pooling_layer(2, self.conv_1, 2, 2)
        self.conv_3 = self.conv_layer(3, self.pool_2, 32, 3, 1)
        self.pool_4 = self.pooling_layer(4, self.conv_3, 2, 2)
        self.conv_5 = self.conv_layer(5, self.pool_4, 64, 3, 1)
        self.pool_6 = self.pooling_layer(6, self.conv_5, 2, 2)
        self.conv_7 = self.conv_layer(7, self.pool_6, 128, 3, 1)
        self.pool_8 = self.pooling_layer(8, self.conv_7, 2, 2)
        self.conv_9 = self.conv_layer(9, self.pool_8, 256, 3, 1)
        self.pool_10 = self.pooling_layer(10, self.conv_9, 2, 2)
        self.conv_11 = self.conv_layer(11, self.pool_10, 512, 3, 1)
        self.pool_12 = self.pooling_layer(12, self.conv_11, 2, 2)
        self.conv_13 = self.conv_layer(13, self.pool_12, 1024, 3, 1)
        self.conv_14 = self.conv_layer(14, self.conv_13, 1024, 3, 1)
        self.conv_15 = self.conv_layer(15, self.conv_14, 1024, 3, 1)
        self.fc_16 = self.fc_layer(16, self.conv_15, 256, flat=True, linear=False)
        self.fc_17 = self.fc_layer(17, self.fc_16, 4096, flat=False, linear=False)
        # skip dropout_18
        self.fc_19 = self.fc_layer(19, self.fc_17, 1470, flat=False, linear=True)
        self.sess = tf.Session()
        self.sess.run(tf.global_variables_initializer())
        self.saver = tf.train.Saver()
        self.saver.restore(self.sess, self.weights_file)
        if self.disp_console: print("Loading complete!" + '\n')

    def conv_layer(self, idx, inputs, filters, size, stride):
        channels = inputs.get_shape()[3]
        weight = tf.Variable(tf.truncated_normal([size, size, int(channels), filters], stddev=0.1))
        biases = tf.Variable(tf.constant(0.1, shape=[filters]))

        pad_size = size // 2
        pad_mat = np.array([[0, 0], [pad_size, pad_size], [pad_size, pad_size], [0, 0]])
        inputs_pad = tf.pad(inputs, pad_mat)

        conv = tf.nn.conv2d(inputs_pad, weight, strides=[1, stride, stride, 1], padding='VALID',
                            name=str(idx) + '_conv')
        conv_biased = tf.add(conv, biases, name=str(idx) + '_conv_biased')
        if self.disp_console: print(
            '    Layer  %d : Type = Conv, Size = %d * %d, Stride = %d, Filters = %d, Input channels = %d' % (
            idx, size, size, stride, filters, int(channels)))
        return tf.maximum(self.alpha * conv_biased, conv_biased, name=str(idx) + '_leaky_relu')

    def pooling_layer(self, idx, inputs, size, stride):
        if self.disp_console: print(
            '    Layer  %d : Type = Pool, Size = %d * %d, Stride = %d' % (idx, size, size, stride))
        return tf.nn.max_pool(inputs, ksize=[1, size, size, 1], strides=[1, stride, stride, 1], padding='SAME',
                              name=str(idx) + '_pool')

    def fc_layer(self, idx, inputs, hiddens, flat=False, linear=False):
        input_shape = inputs.get_shape().as_list()
        if flat:
            dim = input_shape[1] * input_shape[2] * input_shape[3]
            inputs_transposed = tf.transpose(inputs, (0, 3, 1, 2))
            inputs_processed = tf.reshape(inputs_transposed, [-1, dim])
        else:
            dim = input_shape[1]
            inputs_processed = inputs
        weight = tf.Variable(tf.truncated_normal([dim, hiddens], stddev=0.1))
        biases = tf.Variable(tf.constant(0.1, shape=[hiddens]))
        if self.disp_console: print(
            '    Layer  %d : Type = Full, Hidden = %d, Input dimension = %d, Flat = %d, Activation = %d' % (
            idx, hiddens, int(dim), int(flat), 1 - int(linear)))
        if linear: return tf.add(tf.matmul(inputs_processed, weight), biases, name=str(idx) + '_fc')
        ip = tf.add(tf.matmul(inputs_processed, weight), biases)
        return tf.maximum(self.alpha * ip, ip, name=str(idx) + '_fc')

    def detect_from_cvmat(self, img):
        s = time.time()
        self.h_img, self.w_img, _ = img.shape
        img_resized = cv2.resize(img, (448, 448))
        img_RGB = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB)
        img_resized_np = np.asarray(img_RGB)
        inputs = np.zeros((1, 448, 448, 3), dtype='float32')
        inputs[0] = (img_resized_np / 255.0) * 2.0 - 1.0
        in_dict = {self.x: inputs}
        net_output = self.sess.run(self.fc_19, feed_dict=in_dict)
        self.result = self.interpret_output(net_output[0])
        self.show_results(img, self.result)
        strtime = str(time.time() - s)
        if self.disp_console: print('Elapsed time : ' + strtime + ' secs' + '\n')

    def detect_from_file(self, filename):
        if self.disp_console: print('Detect from ' + filename)
        img = cv2.imread(filename)
        self.detect_from_cvmat(img)

    def interpret_output(self, output):
        probs = np.zeros((7, 7, 2, 20))
        class_probs = np.reshape(output[0:980], (7, 7, 20))
        scales = np.reshape(output[980:1078], (7, 7, 2))
        boxes = np.reshape(output[1078:], (7, 7, 2, 4))
        offset = np.transpose(np.reshape(np.array([np.arange(7)] * 14), (2, 7, 7)), (1, 2, 0))

        boxes[:, :, :, 0] += offset
        boxes[:, :, :, 1] += np.transpose(offset, (1, 0, 2))
        boxes[:, :, :, 0:2] = boxes[:, :, :, 0:2] / 7.0
        boxes[:, :, :, 2] = np.multiply(boxes[:, :, :, 2], boxes[:, :, :, 2])
        boxes[:, :, :, 3] = np.multiply(boxes[:, :, :, 3], boxes[:, :, :, 3])

        boxes[:, :, :, 0] *= self.w_img
        boxes[:, :, :, 1] *= self.h_img
        boxes[:, :, :, 2] *= self.w_img
        boxes[:, :, :, 3] *= self.h_img

        for i in range(2):
            for j in range(20):
                probs[:, :, i, j] = np.multiply(class_probs[:, :, j], scales[:, :, i])

        filter_mat_probs = np.array(probs >= self.threshold, dtype='bool')
        filter_mat_boxes = np.nonzero(filter_mat_probs)
        boxes_filtered = boxes[filter_mat_boxes[0], filter_mat_boxes[1], filter_mat_boxes[2]]
        probs_filtered = probs[filter_mat_probs]
        classes_num_filtered = np.argmax(filter_mat_probs, axis=3)[
            filter_mat_boxes[0], filter_mat_boxes[1], filter_mat_boxes[2]]

        argsort = np.array(np.argsort(probs_filtered))[::-1]
        boxes_filtered = boxes_filtered[argsort]
        probs_filtered = probs_filtered[argsort]
        classes_num_filtered = classes_num_filtered[argsort]

        for i in range(len(boxes_filtered)):
            if probs_filtered[i] == 0: continue
            for j in range(i + 1, len(boxes_filtered)):
                if self.iou(boxes_filtered[i], boxes_filtered[j]) > self.iou_threshold:
                    probs_filtered[j] = 0.0

        filter_iou = np.array(probs_filtered > 0.0, dtype='bool')
        boxes_filtered = boxes_filtered[filter_iou]
        probs_filtered = probs_filtered[filter_iou]
        classes_num_filtered = classes_num_filtered[filter_iou]

        result = []
        for i in range(len(boxes_filtered)):
            result.append([self.classes[classes_num_filtered[i]], boxes_filtered[i][0], boxes_filtered[i][1],
                           boxes_filtered[i][2], boxes_filtered[i][3], probs_filtered[i]])

        return result

    def show_results(self, img, results):
        img_cp = img.copy()
        if self.filewrite_txt:
            ftxt = open(self.tofile_txt, 'w')
        for i in range(len(results)):
            x = int(results[i][1])
            y = int(results[i][2])
            w = int(results[i][3]) // 2
            h = int(results[i][4]) // 2
            if self.disp_console: print(
                '    class : ' + results[i][0] + ' , [x,y,w,h]=[' + str(x) + ',' + str(y) + ',' + str(
                    int(results[i][3])) + ',' + str(int(results[i][4])) + '], Confidence = ' + str(results[i][5]))
            if self.filewrite_img or self.imshow:
                cv2.rectangle(img_cp, (x - w, y - h), (x + w, y + h), (0, 255, 0), 2)
                cv2.rectangle(img_cp, (x - w, y - h - 20), (x + w, y - h), (125, 125, 125), -1)
                cv2.putText(img_cp, results[i][0] + ' : %.2f' % results[i][5], (x - w + 5, y - h - 7),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1)
            if self.filewrite_txt:
                ftxt.write(results[i][0] + ',' + str(x) + ',' + str(y) + ',' + str(w) + ',' + str(h) + ',' + str(
                    results[i][5]) + '\n')
        if self.filewrite_img:
            if self.disp_console: print('    image file writed : ' + self.tofile_img)
            cv2.imwrite(self.tofile_img, img_cp)
        if self.imshow:
            cv2.imshow('YOLO_tiny detection', img_cp)
            cv2.waitKey(1)
        if self.filewrite_txt:
            if self.disp_console: print('    txt file writed : ' + self.tofile_txt)
            ftxt.close()

    def iou(self, box1, box2):
        tb = min(box1[0] + 0.5 * box1[2], box2[0] + 0.5 * box2[2]) - max(box1[0] - 0.5 * box1[2],
                                                                         box2[0] - 0.5 * box2[2])
        lr = min(box1[1] + 0.5 * box1[3], box2[1] + 0.5 * box2[3]) - max(box1[1] - 0.5 * box1[3],
                                                                         box2[1] - 0.5 * box2[3])
        if tb < 0 or lr < 0:
            intersection = 0
        else:
            intersection = tb * lr
        return intersection / (box1[2] * box1[3] + box2[2] * box2[3] - intersection)


if __name__ == '__main__':
    fromfile = 'test/000001.jpg'
    tofile_img = 'test/output.jpg'
    tofile_txt = 'test/output.txt'

    yolo = YOLO_TF(fromfile=fromfile, tofile_img=tofile_img, tofile_txt=tofile_txt)
    cv2.waitKey(1000)

四、效果測試

直接執行上述程式碼，便可執行程式。根據程式碼：

classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
               "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

tiny YOLO只可識別上述常見的20類物件。關於上述程式碼的使用，每次測試影象時，只用修改倒數第5行的fromfile引數，然後直接執行便可執行目標檢測。

下面給出目標檢測的效果，雖然人檢測了出來，但是自行車沒有被檢測到，還有將貓錯誤識別成狗的：

目前來看，雖然識別精度不高，但是主要物件還是能夠識別出來的。

五、分析

1.tiny YOLO目前是需要下載別人訓練好的檔案進行實驗，如何訓練還有待於進一步學習。

2.tiny YOLO目前的識別精度不是很高，不過識別速度很快。另外對於一些具有重疊部分的物件，其識別效果可能會比較差。

深度學習（三）——tiny YOLO演算法實現實時目標檢測（tensorflow實現）

一、背景介紹

二、演算法原理簡述

三、演算法實現

1.所用檔案

2.演算法實現

四、效果測試

五、分析

深度學習（三）——tiny YOLO演算法實現實時目標檢測（tensorflow實現）

手把手教你如何用objection detection API實現實時目標檢測（三）

手把手教你如何用objection detection API實現實時目標檢測（二）

手把手教你如何用objection detection API實現實時目標檢測（一）

C++利用幀差法背景差分實現運動目標檢測（opencv）

吳恩達【深度學習工程師】 04.卷積神經網絡第三周目標檢測（1）基本的對象檢測算法

【深度學習】谷歌雲GPU伺服器建立與使用指南（三）

【深度學習基礎-07】神經網路演算法（Neural Network）上--BP神經網路基礎理論

【機器學習】【深度學習】【人工智慧】【演算法工程師】面試問題彙總（持續更新）

反向傳播的工作原理（深度學習第三章）

Node學習基礎(三) 之檔案的同步和非同步寫入操作（fs_檔案系統）

《TensorFlow：實戰Google深度學習框架》--5.2.1 MNIST手寫識別問題（程式已改進）

《TensorFlow：實戰Google深度學習框架》——6.3 卷積神經網路常用結構（池化層）

深度學習機器？王興用美團完美詮釋！（祝賀美團成功上市）

深度學習 --- CNN的變體在影象分類、影象檢測、目標跟蹤、語義分割和例項分割的簡介（附論文連結）

[TensorFlow深度學習入門]實戰十·用RNN(LSTM)做時間序列預測（曲線擬合）

深度學習筆記——理論與推導之Structured Learning【NLP】（十二）

【深度學習_4.3】構建YOLO物體識別演算法

真傳x深度學習第二課：nvidia顯示卡驅動和cuda安裝（小米13.3，顯示卡mx150）

【資源】深度學習 Top100：近 5 年被引用次數最高論文（下載）

深度學習（三）——tiny YOLO演算法實現實時目標檢測（tensorflow實現）

一、背景介紹

二、演算法原理簡述

三、演算法實現

1.所用檔案

2.演算法實現

四、效果測試

五、分析

相關推薦