【SSD】用caffe-ssd框架自帶VGG網路訓練自己的資料集
一、挑選資料集
我先是從ImageNet官網下載了所有關於杯子的圖片
然後從ILSVRC2011,ILSVRC2012,ILSVRC2013和ILSVRC2015資料集通過搜尋xml中杯子的代號挑出了包含杯子的資料集。
二、處理xml檔案
我只需要杯子的資訊,其他物體資訊要從xml檔案中刪掉。否則生成lmdb檔案的時候會出現錯誤,提示“Unknown name: xxxxxxxx”。xxxx就是除了杯子以外的物體的代號。
嘗試了很多方法,不多說,看下面具體步驟:
1.將Annotations資料夾改名為:Annos
2.新建一個空資料夾名字為:Annotations
3.修改下面名字為“delete_by_name.py”的python工具程式碼,只需要修改if not後面內容。引號內為你要保留的資料的代號。
4.執行python工具。
[python] view plain copy print?- #!/usr/bin/env python2
- # -*- coding: utf-8 -*-
- """
- Created on Tue Oct 31 10:03:03 2017
- @author: hans
- http://blog.csdn.net/renhanchi
- """
- import os
- import xml.etree.ElementTree as ET
- origin_ann_dir = 'Annos/'
- new_ann_dir = 'Annotations/'
- for dirpaths, dirnames, filenames in
- for filename in filenames:
- if os.path.isfile(r'%s%s' %(origin_ann_dir, filename)):
- origin_ann_path = os.path.join(r'%s%s' %(origin_ann_dir, filename))
- new_ann_path = os.path.join(r'%s%s' %(new_ann_dir, filename))
- tree = ET.parse(origin_ann_path)
- root = tree.getroot()
- for object in root.findall('object'):
- name = str(object.find('name').text)
- ifnot (name == "n03147509"or \
- name == "n03216710"or \
- name == "n03438257"or \
- name == "n03797390"or \
- name == "n04559910"or \
- name == "n07930864"):
- root.remove(object)
- tree.write(new_ann_path)
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Tue Oct 31 10:03:03 2017
@author: hans
http://blog.csdn.net/renhanchi
"""
import os
import xml.etree.ElementTree as ET
origin_ann_dir = 'Annos/'
new_ann_dir = 'Annotations/'
for dirpaths, dirnames, filenames in os.walk(origin_ann_dir):
for filename in filenames:
if os.path.isfile(r'%s%s' %(origin_ann_dir, filename)):
origin_ann_path = os.path.join(r'%s%s' %(origin_ann_dir, filename))
new_ann_path = os.path.join(r'%s%s' %(new_ann_dir, filename))
tree = ET.parse(origin_ann_path)
root = tree.getroot()
for object in root.findall('object'):
name = str(object.find('name').text)
if not (name == "n03147509" or \
name == "n03216710" or \
name == "n03438257" or \
name == "n03797390" or \
name == "n04559910" or \
name == "n07930864"):
root.remove(object)
tree.write(new_ann_path)
三、生成訓練集和驗證集txt檔案
先新建一個名字為doc的資料夾
下面名字為“cup_list.sh”程式碼並不是我最終使用的,你們根據自己情況做適當修改。
[python] view plain copy print?- #!/bin/sh
- classes=(JPEGImages Annotations)
- root_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
- for dataset in train val
- do
- if [ $dataset == "train" ]
- then
- data_dir=(ILSVRC2015_train ILSVRC2015_val ILSVRC_train ImageNet)
- fi
- if [ $dataset == "val" ]
- then
- data_dir=(ILSVRC_val)
- fi
- for cla in ${data_dir[@]}
- do
- forclassin ${classes[@]}
- do
- find ./$cla/$class/ -name "*.jpg" >> ${class}_${dataset}.txt
- done
#!/bin/sh
classes=(JPEGImages Annotations)
root_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
for dataset in train val
do
if [ $dataset == "train" ]
then
data_dir=(ILSVRC2015_train ILSVRC2015_val ILSVRC_train ImageNet)
fi
if [ $dataset == "val" ]
then
data_dir=(ILSVRC_val)
fi
for cla in ${data_dir[@]}
do
for class in ${classes[@]}
do
find ./$cla/$class/ -name "*.jpg" >> ${class}_${dataset}.txt
done
[python] view plain copy print?- forclassin ${classes[@]}
- do
- find ./$cla/$class/ -name "*.jpg" >> ${class}_${dataset}.txt
- done
for class in ${classes[@]}
do
find ./$cla/$class/ -name "*.jpg" >> ${class}_${dataset}.txt
done
done paste -d' ' JPEGImages_${dataset}.txt Annotations_${dataset}.txt >> temp_${dataset}.txt cat temp_${dataset}.txt | awk 'BEGIN{srand()}{print rand()"\t"$0}' | sort -k1,1 -n | cut -f2- > $dataset.txt if [ $dataset == "val" ] then /home/hans/caffe-ssd/build/tools/get_image_size $root_dir $dataset.txt $dataset"_name_size.txt" fi rm temp_${dataset}.txt rm JPEGImages_${dataset}.txt rm Annotations_${dataset}.txtdonemv train.txt doc/mv val.txt doc/mv val_name_size.txt doc/四、寫labelmap_cup.prototxt
這個檔案放到doc目錄下。
有幾個問題需要注意。
1.label 0 必須是background
2.雖然我只檢測杯子,但是xml檔案中杯子name的程式碼有好幾個。
我一開始將所有label都設定為1,後來生成lmdb檔案的時候報錯。
我只能乖乖的按順序寫下去,不過問題不大。反正知道1到6都是杯子就好。
五、生成lmdb檔案
這先是出現了上面提到的Unknown name錯誤,通過修改xml解決了。
後來又出現呼叫caffe模組的Symbol錯誤,反正你們跟我走就好,錯不了。
先修改一個檔案caffe-ssd/scripts/create_annoset.py
然後執行cup_data.sh
- cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
- root_dir=/home/hans/caffe-ssd
- redo=1
- data_root_dir="${cur_dir}"
- dataset_name="doc"
- mapfile="${cur_dir}/doc/labelmap_cup.prototxt"
- anno_type="detection"
- db="lmdb"
- min_dim=0
- max_dim=0
- width=0
- height=0
- extra_cmd="--encode-type=JPEG --encoded"
- if [ $redo ]
- then
- extra_cmd="$extra_cmd --redo"
- fi
- for subset in train val
- do
- python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim \
- --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir \
- $cur_dir/$dataset_name/$subset.txt $data_root_dir/$dataset_name/$subset"_"$db ln/
- done
- rm -rf ln/
cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=/home/hans/caffe-ssd
redo=1
data_root_dir="${cur_dir}"
dataset_name="doc"
mapfile="${cur_dir}/doc/labelmap_cup.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0
extra_cmd="--encode-type=JPEG --encoded"
if [ $redo ]
then
extra_cmd="$extra_cmd --redo"
fi
for subset in train val
do
python $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim \
--max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir \
$cur_dir/$dataset_name/$subset.txt $data_root_dir/$dataset_name/$subset"_"$db ln/
done
rm -rf ln/
六、訓練
先去下載預訓練模型放到doc目錄下。
修改訓練程式碼真是一件熬心熬力的事兒,路徑太多,問題也不少。還好github issues上作業挺給力。
先放出我的ssd_pascal.py程式碼:
[python] view plain copy print?- from __future__ import print_function
- import sys
- sys.path.append("/home/hans/caffe-ssd/python") #####改
- import caffe
- from caffe.model_libs import *
- from google.protobuf import text_format
- import math
- import os
- import shutil
- import stat
- import subprocess
- # Add extra layers on top of a "base" network (e.g. VGGNet or Inception).
- def AddExtraLayers(net, use_batchnorm=True, lr_mult=1):
- use_relu = True
- # Add additional convolutional layers.
- # 19 x 19
- from_layer = net.keys()[-1]
- # TODO(weiliu89): Construct the name using the last layer to avoid duplication.
- # 10 x 10
- out_layer = "conv6_1"
- ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 1, 0, 1,
- lr_mult=lr_mult)
- from_layer = out_layer
- out_layer = "conv6_2"
- ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 512, 3, 1, 2,
- lr_mult=lr_mult)
- # 5 x 5
- from_layer = out_layer
- out_layer = "conv7_1"
- ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
- lr_mult=lr_mult)
- from_layer = out_layer
- out_layer = "conv7_2"
- ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 1, 2,
- lr_mult=lr_mult)
- # 3 x 3
- from_layer = out_layer
- out_layer = "conv8_1"
- ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
- lr_mult=lr_mult)
- from_layer = out_layer
- out_layer = "conv8_2"
- ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 0, 1,
- lr_mult=lr_mult)
- # 1 x 1
- from_layer = out_layer
- out_layer = "conv9_1"
- ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
- lr_mult=lr_mult)
- from_layer = out_layer
- out_layer = "conv9_2"
- ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 0, 1,
- lr_mult=lr_mult)
- return net
- ### Modify the following parameters accordingly ###
- # The directory which contains the caffe code.
- # We assume you are running the script at the CAFFE_ROOT.
- caffe_root = "/home/hans/caffe-ssd"#####改
- # Set true if you want to start training right after generating all files.
- run_soon = True
- # Set true if you want to load from most recently saved snapshot.
- # Otherwise, we will load from the pretrain_model defined below.
- resume_training = True
- # If true, Remove old model files.
- remove_old_models = False
- # The database file for training data. Created by data/VOC0712/create_data.sh
- train_data = "/home/hans/data/ImageNet/Detection/cup/doc/train_lmdb"#########改
- # The database file for testing data. Created by data/VOC0712/create_data.sh
- test_data = "/home/hans/data/ImageNet/Detection/cup/doc/val_lmdb"########改
- # Specify the batch sampler.
- resize_width = 300
- resize_height = 300
- resize = "{}x{}".format(resize_width, resize_height)
- batch_sampler = [
- {
- 'sampler': {
- },
- 'max_trials': 1,
- 'max_sample': 1,
- },
- {
- 'sampler': {
- 'min_scale': 0.3,
- 'max_scale': 1.0,
- 'min_aspect_ratio': 0.5,
- 'max_aspect_ratio': 2.0,
- },
- 'sample_constraint': {
- 'min_jaccard_overlap': 0.1,
- },
- 'max_trials': 50,
- 'max_sample': 1,
- },
- {
- 'sampler': {
- 'min_scale': 0.3,
- 'max_scale': 1.0,
- 'min_aspect_ratio': 0.5,
- 'max_aspect_ratio': 2.0,
- },
- 'sample_constraint': {
- 'min_jaccard_overlap': 0.3,
- },
- 'max_trials': 50,
- 'max_sample': 1,
- },
- {
- 'sampler': {
- 'min_scale': 0.3,
- 'max_scale': 1.0,
- 'min_aspect_ratio': 0.5,
- 'max_aspect_ratio': 2.0,
- },
- 'sample_constraint': {
- 'min_jaccard_overlap': 0.5,
- },
- 'max_trials': 50,
- 'max_sample': 1,
- },
- {
- 'sampler': {
- 'min_scale': 0.3,
- 'max_scale': 1.0,
- 'min_aspect_ratio': 0.5,
- 'max_aspect_ratio': 2.0,
- },
- 'sample_constraint': {
- 'min_jaccard_overlap': 0.7,
- },
- 'max_trials': 50,
- 'max_sample': 1,
- },
- {
- 'sampler': {
- 'min_scale': 0.3,
- 'max_scale': 1.0,
- 'min_aspect_ratio': 0.5,
- 'max_aspect_ratio': 2.0,
- },
- 'sample_constraint': {
- 'min_jaccard_overlap': 0.9,
- },
- 'max_trials': 50,
- 'max_sample': 1,
- },
- {
- 'sampler': {
- 'min_scale': 0.3,
- 'max_scale': 1.0,
- 'min_aspect_ratio': 0.5,
- 'max_aspect_ratio': 2.0,
- },
- 'sample_constraint': {
- 'max_jaccard_overlap': 1.0,
- },
- 'max_trials': 50,
- 'max_sample': 1,
- },
- ]
- train_transform_param = {
- 'mirror': True,
- 'mean_value': [104, 117, 123],
- 'force_color': True, ####改
- 'resize_param': {
- 'prob': 1,
- 'resize_mode': P.Resize.WARP,
- 'height': resize_height,
- 'width': resize_width,
- 'interp_mode': [
- P.Resize.LINEAR,
- P.Resize.AREA,
- P.Resize.NEAREST,
- P.Resize.CUBIC,
- P.Resize.LANCZOS4,
- ],
- },
- 'distort_param': {
- 'brightness_prob': 0.5,
- 'brightness_delta': 32,
- 'contrast_prob': 0.5,
- 'contrast_lower': 0.5,
- 'contrast_upper': 1.5,
- 'hue_prob': 0.5,
- 'hue_delta': 18,
- 'saturation_prob': 0.5,
- 'saturation_lower': 0.5,
- 'saturation_upper': 1.5,
- 'random_order_prob': 0.0,
- },
- 'expand_param': {
- 'prob': 0.5,
- 'max_expand_ratio': 4.0,
- },
- 'emit_constraint': {
- 'emit_type': caffe_pb2.EmitConstraint.CENTER,
- }
- }
- test_transform_param = {
- 'mean_value': [104, 117, 123],
- 'force_color': True, ####改
- 'resize_param': {
- 'prob': 1,
- 'resize_mode': P.Resize.WARP,
- 'height': resize_height,
- 'width': resize_width,
- 'interp_mode': [P.Resize.LINEAR],
- },
- }
- # If true, use batch norm for all newly added layers.
- # Currently only the non batch norm version has been tested.
- use_batchnorm = False
- lr_mult = 1
- # Use different initial learning rate.
- if use_batchnorm:
- base_lr = 0.0004
- else:
- # A learning rate for batch_size = 1, num_gpus = 1.
- base_lr = 0.00004
- root = "/home/hans/data/ImageNet/Detection/cup"####改
- # Modify the job name if you want.
- job_name = "SSD_{}".format(resize) ####改
- # The name of the model. Modify it if you want.
- model_name = "VGG_CUP_{}".format(job_name) ####改
- # Directory which stores the model .prototxt file.
- save_dir = "{}/doc/{}".format(root, job_name) ####改
- # Directory which stores the snapshot of models.
- snapshot_dir = "{}/models/{}".format(root, job_name) ####改
- # Directory which stores the job script and log file.
- job_dir = "{}/jobs/{}".format(root, job_name) ####改
- # Directory which stores the detection results.
- output_result_dir = "{}/results/{}".format(root, job_name) ####改
- # model definition files.
- train_net_file = "{}/train.prototxt".format(save_dir)
- test_net_file = "{}/test.prototxt".format(save_dir)
- deploy_net_file = "{}/deploy.prototxt".format(save_dir)
- solver_file = "{}/solver.prototxt".format(save_dir)
- # snapshot prefix.
- snapshot_prefix = "{}/{}".format(snapshot_dir, model_name)
- # job script path.
- job_file = "{}/{}.sh".format(job_dir, model_name)
- # Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
- name_size_file = "{}/doc/val_name_size.txt".format(root) ####改
- # The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
- pretrain_model = "{}/doc/VGG_ILSVRC_16_layers_fc_reduced.caffemodel".format(root) ####改
- # Stores LabelMapItem.
- label_map_file = "{}/doc/labelmap_cup.prototxt".format(root) ####改
- # MultiBoxLoss parameters.
- num_classes = 7####改
- share_location = True
- background_label_id=0
- train_on_diff_gt = True
- normalization_mode = P.Loss.VALID
- code_type = P.PriorBox.CENTER_SIZE
- ignore_cross_boundary_bbox = False
- mining_type = P.MultiBoxLoss.MAX_NEGATIVE
- neg_pos_ratio = 3.
- loc_weight = (neg_pos_ratio + 1.) / 4.
- multibox_loss_param = {
- 'loc_loss_type': P.MultiBoxLoss.SMOOTH_L1,
- 'conf_loss_type': P.MultiBoxLoss.SOFTMAX,
- 'loc_weight': loc_weight,
- 'num_classes': num_classes,
- 'share_location': share_location,
- 'match_type': P.MultiBoxLoss.PER_PREDICTION,
- 'overlap_threshold': 0.5,
- 'use_prior_for_matching': True,
- 'background_label_id': background_label_id,
- 'use_difficult_gt': train_on_diff_gt,
- 'mining_type': mining_type,
- 'neg_pos_ratio': neg_pos_ratio,
- 'neg_overlap': 0.5,
- 'code_type': code_type,
- 'ignore_cross_boundary_bbox': ignore_cross_boundary_bbox,
- }
- loss_param = {
- 'normalization': normalization_mode,
- }
- # parameters for generating priors.
- # minimum dimension of input image
- min_dim = 300
- # conv4_3 ==> 38 x 38
- # fc7 ==> 19 x 19
- # conv6_2 ==> 10 x 10
- # conv7_2 ==> 5 x 5
- # conv8_2 ==> 3 x 3
- # conv9_2 ==> 1 x 1
- mbox_source_layers = ['conv4_3', 'fc7', 'conv6_2', 'conv7_2', 'conv8_2', 'conv9_2']
- # in percent %
- min_ratio = 20
- max_ratio = 90
- step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))
- min_sizes = []
- max_sizes = []
- for ratio in xrange(min_ratio, max_ratio + 1, step):
- min_sizes.append(min_dim * ratio / 100.)
- max_sizes.append(min_dim * (ratio + step) / 100.)
- min_sizes = [min_dim * 10 / 100.] + min_sizes
- max_sizes = [min_dim * 20 / 100.] + max_sizes
- steps = [8, 16, 32, 64, 100, 300]
- aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]
- # L2 normalize conv4_3.
- normalizations = [20, -1, -1, -1, -1, -1]
- # variance used to encode/decode prior bboxes.
- if code_type == P.PriorBox.CENTER_SIZE:
- prior_variance = [0.1, 0.1, 0.2, 0.2]
- else:
- prior_variance = [0.1]
- flip = True
- clip = False
- # Solver parameters.
- # Defining which GPUs to use.
- gpus = "7"####改
- gpulist = gpus.split(",")
- num_gpus = len(gpulist)
- # Divide the mini-batch to different GPUs.
- batch_size = 32
- accum_batch_size = 32
- iter_size = accum_batch_size / batch_size
- solver_mode = P.Solver.CPU
- device_id = 0
- batch_size_per_device = batch_size
- if num_gpus > 0:
- batch_size_per_device = int(math.ceil(float(batch_size) / num_gpus))
- iter_size = int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus)))
- solver_mode = P.Solver.GPU
- device_id = int(gpulist[0])
- if normalization_mode == P.Loss.NONE:
- base_lr /= batch_size_per_device
- elif normalization_mode == P.Loss.VALID:
- base_lr *= 25. / loc_weight
- elif normalization_mode == P.Loss.FULL:
- # Roughly there are 2000 prior bboxes per image.
- # TODO(weiliu89): Estimate the exact # of priors.
- base_lr *= 2000.
- # Evaluate on whole test set.
- num_test_image = 2000####改
- test_batch_size = 8
- # Ideally test_batch_size should be divisible by num_test_image,
- # otherwise mAP will be slightly off the true value.
- test_iter = int(math.ceil(float(num_test_image) / test_batch_size))
- solver_param = {
- # Train parameters
- 'base_lr': base_lr,
- 'weight_decay': 0.0005,
- 'lr_policy': "multistep",
- 'stepvalue': [80000, 100000, 120000],
- 'gamma': 0.1,
- 'momentum': 0.9,
- 'iter_size': iter_size,
- 'max_iter': 120000,
- 'snapshot': 80000,
- 'display': 10,
- 'average_loss': 10,
- 'type': "SGD",
- 'solver_mode': solver_mode,
- 'device_id': device_id,
- 'debug_info': False,
- 'snapshot_after_train': True,
- # Test parameters
- 'test_iter': [test_iter],
- 'test_interval': 100,
- 'eval_type': "detection",
- 'ap_version': "11point",
- 'test_initialization': True,
- }
- # parameters for generating detection output.
- det_out_param = {
- 'num_classes': num_classes,
- 'share_location': share_location,
- 'background_label_id': background_label_id,
- 'nms_param': {'nms_threshold': 0.45, 'top_k': 400},
- 'save_output_param': {
- 'output_directory': output_result_dir,