1. 程式人生 > >(更新視訊教程)Tensorflow object detection API 搭建屬於自己的物體識別模型(2)——訓練並使用自己的模型

(更新視訊教程)Tensorflow object detection API 搭建屬於自己的物體識別模型(2)——訓練並使用自己的模型

2018.05.10

本人時差黨,有時候回覆不及時。建立了一個QQ群,方便大家互相學習交流。

---------------------------------------------------------------------------------------------------------------------------------------

2群號: 902067304

---------------------------------------------------------------------------------------------------------------

(1群人已滿)點選連結加入群聊【Tensorflow學習交流群】:https://jq.qq.com/?_wv=1027&k=55j9V1r

------------------------------------------------------------------------------------------------

2018.05.04更新!

如何將訓練好的模型移植到Android手機上:

視訊演示:

-----------------------------------------------------------------------------------------------------

2018.04.02更新!

系列操作視訊已經上傳,請有需要的讀者自行前往。寫部落格的時候Tensorflow是1.4版本,視訊裡更新的是1.7版本,這中間遇到非常多的問題,加上第一次做視訊,難免有很多問題,感謝理解!

另外一個部落格裡更新了常見問題彙總,大家可以去看一下,歡迎分享或糾正!

------------------------------------------------------------------------------------------------------------------------------

在上一篇部落格中(),我們成功安裝了Tensorflow Object Detection API所需的開發環境,並在官方的Demo上成功進行了測試,接下來嘗試運用自己的資料進行訓練與測試。

專案程式碼彙總:

一、分析程式碼結構

仍然開啟object_detection資料夾中的 object_detection_tutorial.ipynb ,分析程式碼結構。

第一部分Imports匯入需要的包,不需要做更改。

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

if tf.__version__ < '1.4.0':
  raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')

第二部分Env setup 設定系統環境,不必更改。

# This is needed to display the images.
%matplotlib inline

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")

第三部分Object detection imports 匯入Object detection 需要的模組,如果報錯,說明工作目錄設定不對,或者.../research以及.../research/slim 的環境變數沒有設定好。

from utils import label_map_util

from utils import visualization_utils as vis_util


第四部分為設定模型的對應引數。

# 下載模型的名字
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection. 
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

NUM_CLASSES = 90

github上有對應官方的各種模型(地址摸我),這些都是基於不用的資料集事先訓練好的模型,下載好以後就可以直接呼叫。下載的檔案以 '.tar.gz'結尾。'PATH_TO_CKPT'為‘.pb’檔案的目錄,'.pb'檔案是訓練好的模型(frozen detection graph),即用來預測時使用的模型。‘PATH_TO_LABELS’為標籤檔案,記錄了哪些標籤需要識別,'NUM_CLASSES'為類別的數目,根據實際需要修改。

見上圖,第一列是模型名字,第二列是速度,第三列是精度。這裡需要注意幾點:

1、Model name上的名字與程式碼中“MODEL_NAME”後面變數的名字不一樣,可以發現後者還有日期,在寫程式碼的時候需要像後者那樣將名字寫完整,想得到完整的名字,可以直接在網站上點選對應的模型,彈出“另存為”對話方塊時就能夠發現完整的MODEL_NAME”,如下圖所示。

2、列表中速度快的模型,一般自己訓練也會快,但是精度高的不一定使用自己的資料集時精度也高,因為訓練的資料集及模型引數可能本身就存在差異,建議先用Demo中的‘ssd_mobilenet_v1_coco_2017_11_17’,速度最快。
 

第五部分Download Model 為下載模型,通過向對應網站傳送請求進行下載解壓操作。第六部分Load a (frozen) Tensorflow model into memory 將訓練完的模型載入記憶體,第六部分Loading label map將標籤map載入,這幾個部分都不用修改,直接複製即可。

opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)


接下來Detection 部分,首先設定檢測目標資料夾:

# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

在此程式碼中目標資料夾為object_detection資料夾下的test_images夾中 'images1.jpg' 與 'images2.jpg' ,可以直接改成自己需要的資料夾與檔名,如想要檢測object_detection資料夾下的test_images2夾中' frame1.jpg' 到 'frame10.jpg',可以直接改成:

PATH_TO_TEST_IMAGES_DIR = 'test_images2'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'frame{}.jpg'.format(i)) for i in range(1, 11) ]test_images2'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'frame{}.jpg'.format(i)) for i in range(1, 11) ]

最後一部分,執行程式碼,也不需要更改。

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    # Definite input and output Tensors for detection_graph
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
    # Each box represents a part of the image where a particular object was detected.
    detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
    # Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
    detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')
    for image_path in TEST_IMAGE_PATHS:
      image = Image.open(image_path)
      # the array based representation of the image will be used later in order to prepare the
      # result image with boxes and labels on it.
      image_np = load_image_into_numpy_array(image)
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      # Actual detection.
      (boxes, scores, classes, num) = sess.run(
          [detection_boxes, detection_scores, detection_classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      # Visualization of the results of a detection.
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)
      plt.figure(figsize=IMAGE_SIZE)
      plt.imshow(image_np)

弄清楚程式碼結構以後,就可以準備資料輸入了。

二、建立訓練/測試資料集

對於具體的任務,需要具體分析。

對於我個人,我的輸入是一系列實驗視訊,想要在視訊中識別特定的物體。

如上圖,我想要識別中間的螢幕,以及螢幕內的車輛。對於機器學習來說,訓練資料應該是標註好物體位置的檔案。

使用 LabelImg 這款小軟體,選出100張圖片進行人工標註(時間充裕的話越多越好),如下圖所示。

標註完成後儲存為同名的xml檔案。

對於Tensorflow,需要輸入專門的 TFRecords Format 格式。

寫兩個小python指令碼檔案,第一個將資料夾內的xml檔案內的資訊統一記錄到.csv表格中(xml_to_csv.py),第二個從.csv表格中建立TFRecords格式(generate_tfrecord.py),見我的github

附上對應程式碼:

# -*- coding: utf-8 -*-
"""
Created on Tue Jan 16 00:52:02 2018
@author: Xiang Guo
將資料夾內所有XML檔案的資訊記錄到CSV檔案中
"""

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

os.chdir('D:\\test\\test_images\\frame2')
path = 'D:\\test\\test_images\\frame2'

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def main():
    image_path = path
    xml_df = xml_to_csv(image_path)
    xml_df.to_csv('tv_vehicle_labels.csv', index=None)
    print('Successfully converted xml to csv.')


main()
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 16 01:04:55 2018
@author: Xiang Guo
由CSV檔案生成TFRecord檔案
"""

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=data/tv_vehicle_labels.csv  --output_path=train.record
  # Create test data:
  python generate_tfrecord.py --csv_input=data/test_labels.csv  --output_path=test.record
"""



import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

os.chdir('D:\\tensorflow-model\\models\\research\\object_detection\\')

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS


# TO-DO replace this with label map
#注意將對應的label改成自己的類別!!!!!!!!!!
def class_text_to_int(row_label):
    if row_label == 'tv':
        return 1
    elif row_label == 'vehicle':
        return 2
    else:
        None


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), 'images')
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())

    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

對於訓練集與測試集分別執行上述程式碼即可,得到train.record與test.record檔案。
 

三、配置檔案與模型

在上一步我們已經獲得了訓練與測試資料集,為了進一步工作,我們進行檔案的歸總,在object_dection資料夾下,我們有如下的檔案結構:

Object-Detection


-data/
--test_labels.csv

--test.record

--train_labels.csv

--train.record


-images/
--test/
---testingimages.jpg
--train/
---testingimages.jpg
--...yourimages.jpg


-training
 

接下來需要設定配置檔案, 進入 Object Detection github 對應頁面 尋找 配置檔案的Sample。

ssd_mobilenet_v1_coco.config 為例,在 object_dection資料夾下,解壓 ssd_mobilenet_v1_coco_2017_11_17.tar.gz,

將ssd_mobilenet_v1_coco.config 放在training 資料夾下,用文字編輯器開啟(我用的sublime 3),進行如下操作:

1、搜尋其中的  PATH_TO_BE_CONFIGURED ,將對應的路徑改為自己的路徑,注意不要把test跟train弄反了;

2、將 num_classes 按照實際情況更改,我的例子中是2;

3、batch_size 原本是24,我在執行的時候出現視訊記憶體不足的問題,為了保險起見,改為1,如果1還是出現類似問題的話,建議換電腦……

4、fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true

這兩行是設定checkpoint,我開始也設定,但是一直出現視訊記憶體不足的問題,我的理解是從預先訓練的模型中尋找checkpoint,可能是因為原先的模型是基於較大規模的公開資料集訓練的,因此配置到本地的時候出現了問題,後來我選擇刪除這兩行,相當於自己從頭開始訓練,最後正常了,因此如果是自己從頭開始訓練,建議把這兩行刪除。

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 2
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
          anchorwise_output: true
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          anchorwise_output: true
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 1
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "data/train.record"
  }
  label_map_path: "data/tv_vehicle_detection.pbtxt"
}

eval_config: {
  num_examples: 4
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "data/test.record"
  }
  label_map_path: "data/tv_vehicle_detection.pbtxt"
  shuffle: false
  num_readers: 1
  num_epochs: 1
}

上一個config檔案中 label_map_path: "data/tv_vehicle_detection.pbtxt" 必須始終保持一致。

此時在對應目錄(/data)下,建立一個 tv_vehicle_detection.pbtxt的文字檔案(可以複製一個其他名字的檔案,然後用文字編輯軟體開啟修改),寫入我們的標籤,我的例子中是兩個,id序號注意與前面建立CSV檔案時保持一致,從1開始。

item {
  id: 1
  name: 'tv'
}

item {
  id: 2
  name: 'vehicle'
}


儲存完畢,萬事俱備,只欠東風!

四、訓練模型

這裡只討論本地用GPU訓練,想在Google cloud上訓練,請參考這裡

Anaconda Prompt 定位到 models\research\object_detection資料夾下,執行如下命令:

python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

------------------------------------------

2018.7.20更新:

注意,在Tensorflow Object Detection API最新版本中,訓練檔案已經改為 

model_main.py

因此命令也需要更改為如下形式(注意把對應的${}裡的內容改為對應的路徑的檔案),設定訓練步數50000,評估步數2000:

# From the tensorflow/models/research/ directory
python object_detection/model_main.py \
    --pipeline_config_path=object_detection/training/ssd_mobilenet_v1_coco.config \
    --model_dir=object_detection/training \
    --num_train_steps=50000 \
    --num_eval_steps=2000 \
    --alsologtostderr

正常的話,稍等片刻。如果看到類似的介面,恭喜,訓練正在有條不紊地進行。


中途打斷也不要緊,可以再次執行上述Python命令,會從上次的checkpoint繼續。

Tensorflow還提供功能強大的Tensorboard來視覺化訓練過程。

Anaconda Prompt 定位到  models\research\object_detection 資料夾下,執行

tensorboard --logdir='training'


可以看到返回的網址,在瀏覽器中開啟(最好是Chrome或Firefox),

發現並沒有出現影象,這個問題困擾了我非常久,搜遍了網上各種方法都不管用,後來發現,至少在我的電腦上,應該執行下面這行命令:

tensorboard --logdir=training

沒錯!去掉引號,這看起來很難理解,而且也沒有在網上看到其他的例子,但是我個人的情況確實如此。而且需要注意的是,--logdir= 後面沒有空格。

終於出現了最新的圖表,由於我的資料集過小,可以看到後面Loss下降的不明顯,那麼實際效果如何呢,還有待評估。

我們可以先來測試一下目前的模型效果如何,關閉命令列。在 models\research\object_detection 資料夾下找到 export_inference_graph.py 檔案,要執行這個檔案,還需要傳入config以及checkpoint的相關引數。

Anaconda Prompt 定位到 models\research\object_detection 資料夾下,執行

python export_inference_graph.py \ --input_type image_tensor \ --pipeline_config_path training/ssd_mobilenet_v1_coco.config \  --trained_checkpoint_prefix training/model.ckpt-31012 \  --output_directory tv_vehicle_inference_graph

--trained_checkpoint_prefix training/model.ckpt-31012   這個checkpoint(.ckpt-後面的數字)可以在training資料夾下找到你自己訓練的模型的情況,填上對應的數字(如果有多個,選最大的)。

--output_directory tv_vehicle_inference_graph  改成自己的名字

執行完後,可以在tv_vehicle_inference_graph (這是我的名字)資料夾下發現若干檔案,有saved_model、checkpoint、frozen_inference_graph.pb等。 .pb結尾的就是最重要的frozen model了,還記得第一大部分中frozen model嗎?沒錯,就是我們在後面要用到的部分。

訓練的部分也完成了,接下來就是最後的test部分了,excited!

5、測試模型並輸出

回到第一部分的程式碼結構分析,現在已有對應的訓練模型,只需要根據自己的實際情況改一些路徑之類的引數即可。直接上完整程式碼:

# -*- coding: utf-8 -*-
"""
Created on Thu Jan 11 16:55:43 2018

@author: Xiang Guo
"""
#Imports
import time
start = time.time()
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import cv2

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

if tf.__version__ < '1.4.0':
  raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')
  
os.chdir('D:\\tensorflow-model\\models\\research\\object_detection')
  
  
#Env setup 
# This is needed to display the images.
#%matplotlib inline

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")




#Object detection imports
from utils import label_map_util

from utils import visualization_utils as vis_util




#Model preparation
# What model to download.

#這是我們剛才訓練的模型
MODEL_NAME = 'tv_vehicle_inference_graph'



#對應的Frozen model位置
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'tv_vehicle_detection.pbtxt')

#改成自己例子中的類別數,2
NUM_CLASSES = 2



'''
#Download Model
自己的模型,不需要下載了

opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())
'''   
    
    
#Load a (frozen) Tensorflow model into memory.    
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')    
    
    
#Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)


#Helper code
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)


#Detection

# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
#測試圖片位置
PATH_TO_TEST_IMAGES_DIR = os.getcwd()+'\\test_images2'
os.chdir(PATH_TO_TEST_IMAGES_DIR)
TEST_IMAGE_PATHS = os.listdir(PATH_TO_TEST_IMAGES_DIR)

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

output_path = ('D:\\tensorflow-model\\models\\research\\object_detection\\test_output\\self_trained\\')


with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    # Definite input and output Tensors for detection_graph
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
    # Each box represents a part of the image where a particular object was detected.
    detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
    # Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
    detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')
    for image_path in TEST_IMAGE_PATHS:
      image = Image.open(image_path)
      # the array based representation of the image will be used later in order to prepare the
      # result image with boxes and labels on it.
      image_np = load_image_into_numpy_array(image)
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      # Actual detection.
      (boxes, scores, classes, num) = sess.run(
          [detection_boxes, detection_scores, detection_classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      # Visualization of the results of a detection.
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)
      #儲存檔案
      cv2.imwrite(output_path+image_path.split('\\')[-1],image_np)
      
end =  time.time()
print("Execution Time: ", end - start)

    

經過一段時間的執行(由資料大小及電腦配置決定),輸出結果,開啟對應圖片檢視效果。

Amazing!可以看到儘管也有像最後一張圖片那樣的誤判(False Positive),但是考慮到非常有限的訓練集(不到100張)情況下,有如此效果已經非常給力了,尤其是對於需要大量資料才能發揮強大威力的深度學習來說,目前的效果完全能夠接受。可以期待在更多資料以及更精確模型的幫助下,可以達到非常好的效果。

總結

用Tensor Flow object detection API實現了對實驗視訊的特定移動物體的追蹤。

這是本人的深度學習首次嘗試,難免有疏漏之處,有任何問題歡迎指正,轉載請註明。

參考:

1. https://github.com/tensorflow/models/tree/master/research/object_detection

2. https://github.com/datitran/raccoon_dataset

3. How to train your own Object Detector with TensorFlow’s Object Detector API,

https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9