1. 程式人生 > >手把手教你如何用objection detection API實現實時目標檢測(二)

手把手教你如何用objection detection API實現實時目標檢測(二)

上文我們介紹如何配置用objection detection API實現實時目標檢測的環境(原文連結),在這裡,我們將給大家介紹如何如何自己動手做一個目標檢測系統來實現對影象的識別。

一、準備圖片

在這裡博主想要實現的是使用API來對手機進行檢測,因此我們首先需要從網上下載一定數量的手機圖片,將其儲存在資料夾中(這裡需要注意的是,我們下載的圖片都需要為.jpg格式,原因我們會在下文講到)。

二、資料集生成

因為我們之前下載的圖片並沒有標籤,所以我們需要手工對圖片進行標註,即手工對圖片中的手機進行定位並對其標註。在這裡,博主使用的標註工具是labelImg_windows_v1.5.1,這個工具使用較為方便且無需安裝(

下載地址,下載需要積分,如果不想用積分的可以在文章末尾進行評論我私發給你~ )。當然,除此以外也可以自己安裝圖片的標準工具,這個方法比較麻煩,但也可以實現目標,感興趣的可以看一下這篇文章《影象標註工具labelImg安裝方法》,博主說的非常詳細,在這裡我就不再贅述了。

將我們下載得到的labelImg_windows_v1.5.1檔案進行解壓,然後直接開啟其中的可執行檔案即可。我們在視窗的左半部分設定圖片的開啟儲存路徑,然後選用矩形框將手機進行框選並標註即可。
在這裡插入圖片描述

依次對之前我們下載的所有的圖片進行標註,我們發現經過處理之後生成了字尾名為xml的檔案,這個檔案就是我們所用到的資料。
在這裡插入圖片描述
對於得到的資料,我們將其分成測試集和訓練集兩個部分,儲存在資料夾image中。至此,我們已經完成了資料集的準備工作。

三、修改檔案引數

在這裡我,我們選擇使用ssd進行目標檢測,需要對ssd_mobilenet_v1_pets.config中的相關引數進行修改。因此在這裡我們只檢測了一個類別(手機),所以我們將第九行類別的數目設定為1,修改後的程式碼如下:

# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that # should be configured. model { ssd { num_classes: 1 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true } } similarity_calculator { iou_similarity { } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.2 max_scale: 0.95 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.3333 } } image_resizer { fixed_shape_resizer { height: 300 width: 300 } } box_predictor { convolutional_box_predictor { min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.8 kernel_size: 1 box_code_size: 4 apply_sigmoid_to_scores: false conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } } } feature_extractor { type: 'ssd_mobilenet_v1' min_depth: 16 depth_multiplier: 1.0 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } } loss { classification_loss { weighted_sigmoid { anchorwise_output: true } } localization_loss { weighted_smooth_l1 { anchorwise_output: true } } hard_example_miner { num_hard_examples: 3000 iou_threshold: 0.99 loss_type: CLASSIFICATION max_negatives_per_positive: 3 min_negatives_per_image: 0 } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } } train_config: { batch_size: 8 optimizer { rms_prop_optimizer: { learning_rate: { exponential_decay_learning_rate { initial_learning_rate: 0.004 decay_steps: 800720 decay_factor: 0.95 } } momentum_optimizer_value: 0.9 decay: 0.9 epsilon: 1.0 } } fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt" from_detection_checkpoint: true # Note: The below line limits the training process to 200K steps, which we # empirically found to be sufficient enough to train the pets dataset. This # effectively bypasses the learning rate schedule (the learning rate will # never decay). Remove the below line to train indefinitely. num_steps: 1000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } } train_input_reader: { tf_record_input_reader { input_path: "data/train.record" } label_map_path: "training/object-detection.pbtxt" } eval_config: { num_examples: 40 # Note: The below line limits the evaluation process to 10 evaluations. # Remove the below line to evaluate indefinitely. # max_evals: 10 } eval_input_reader: { tf_record_input_reader { input_path: "data/test.record" } label_map_path: "training/object-detection.pbtxt" shuffle: false num_readers: 1 }

同樣的,我們需要設定一下object-detection.pbtxt中的相關引數,只保留一個標籤,且將其改為phone。修改後程式碼如下:

item {
          id: 1
          name: 'phone'
        }

接下來,我們對train_generate_tfrecord.py和test_generate_tfrecord.py中的也進行修改,只保留一個標籤即可。修改後的train_generate_tfrecord.py檔案程式碼如下:

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/train_labels.csv  --output_path=data/train.record

  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=data/test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', 'data/train_labels.csv', 'Path to the CSV input')
flags.DEFINE_string('output_path', 'data/train.record', 'Path to output TFRecord')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'phone':
        return 1
    else:
        0
        

def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), 'images/train')
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

test_generate_tfrecord.py檔案的修改方法和train_generate_tfrecord.py類似,博主就不重複貼上程式碼了,讀者可以自己進行修改。

四、資料製作

上文已經對必要的環境進行設定,同時修改原有模型的相關引數,我們已經做好了相關的訓練準備,接下來,我們就可以通過執行批處理檔案開進行資料製作。

首先我們通過執行xml_to_csv.py檔案把.xml字尾的檔案轉化為.csv字尾的檔案,然後執行後兩個檔案完成資料的製作,這一部分的程式碼如下所示。

python xml_to_csv.py
python train_generate_tfrecord.py
python test_generate_tfrecord.py
pause

五、訓練

在得到我們需要的資料之後,我們就可以根據這個資料進行訓練。這裡我們直接呼叫之前已經修改好的檔案即可

python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config
pause

六、測試圖片

當四和五都已經結束後,我們就可以利用訓練到的模型進行圖片識別了。我們把要測試的圖片放在test_image下,開啟在jupyter notebook中開啟object_detection_tutorial.ipynb。

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Object Detection Demo\n",
    "Welcome to the object detection inference walkthrough!  This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image. Make sure to follow the [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) before you start."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import os\n",
    "import six.moves.urllib as urllib\n",
    "import sys\n",
    "import tarfile\n",
    "import tensorflow as tf\n",
    "import zipfile\n",
    "\n",
    "from collections import defaultdict\n",
    "from io import StringIO\n",
    "from matplotlib import pyplot as plt\n",
    "from PIL import Image\n",
    "\n",
    "if tf.__version__ < '1.4.0':\n",
    "  raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Env setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# This is needed to display the images.\n",
    "%matplotlib inline\n",
    "\n",
    "# This is needed since the notebook is stored in the object_detection folder.\n",
    "sys.path.append(\"..\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Object detection imports\n",
    "Here are the imports from the object detection module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from utils import label_map_util\n",
    "\n",
    "from utils import visualization_utils as vis_util"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model preparation "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variables\n",
    "\n",
    "Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_CKPT` to point to a new .pb file.  \n",
    "\n",
    "By default we use an \"SSD with Mobilenet\" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# What model to download.\n",
    "MODEL_NAME = 'myFinalModelPB'\n",
    "\n",
    "\n",
    "# Path to frozen detection graph. This is the actual model that is used for the object detection.\n",
    "PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'\n",
    "\n",
    "# List of the strings that is used to add correct label for each box.\n",
    "PATH_TO_LABELS = os.path.join('training', 'object-detection.pbtxt')\n",
    "\n",
    "NUM_CLASSES = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load a (frozen) Tensorflow model into memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "detection_graph = tf.Graph()\n",
    "with detection_graph.as_default():\n",
    "  od_graph_def = tf.GraphDef()\n",
    "  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:\n",
    "    serialized_graph = fid.read()\n",
    "    od_graph_def.ParseFromString(serialized_graph)\n",
    "    tf.import_graph_def(od_graph_def, name='')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading label map\n",
    "Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "label_map = label_map_util.load_labelmap(PATH_TO_LABELS)\n",
    "categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)\n",
    "category_index = label_map_util.create_category_index(categories)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Helper code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def load_image_into_numpy_array(image):\n",
    "  (im_width, im_height) = image.size\n",
    "  return np.array(image.getdata()).reshape(\n",
    "      (im_height, im_width, 3)).astype(np.uint8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Detection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "TEST_IMAGE_PATHS=[]\n",
    "for filename in os.listdir(r\"./test_images\"):\n",
    "    if filename.split('.')[-1]=='jpg':\n",
    "        TEST_IMAGE_PATHS.append(os.path.join(os.path.join('test_images',filename)))  \n",
    "# Size, in inches, of the output images.\n",
    "IMAGE_SIZE = (12, 8)"
   ]
  },
  {
   "cell_type": "code",
   "