[caffe筆記005]：通過程式碼理解faster-RCNN中的RPN

阿新 • • 發佈：2018-12-28

https://blog.csdn.net/happyflyy/article/details/54917514

[caffe筆記005]：通過程式碼理解faster-RCNN中的RPN

注意：整個RPN完全是筆者自己的理解，可能會有一些理解錯誤的地方。

1. RPN簡介
RPN是regional proposal networks的縮寫，是faster-RCNN結構中的一部分。faster-RCNN由兩個子網路構成。第一個子網路RPN的作用是在給定影象上提取一定數量帶有objectness(是否包含目標的置信度)。第二個子網路直接利用fast-rcnn中的特徵提取網路，用RPN獲得的proposal替代fast-RCNN中selective search獲取的proposal。

2. RPN的結構
RPN的原理圖如下圖所示。
RPN的結構是在已有的網路結構（例如VGG）的最後一層上新增如下圖的新層。以VGG為例，下圖中每部分的具體結構為：
1. conv feature map：在VGG的conv5_3後新新增的一個[email protected]的卷基層。
2. kk anchor boxes：在每個sliding window的點上的初始化的參考區域。每個sliding window的點上取得anchor boxes都一樣。只要知道sliding window的點的座標，就可以計算出每個anchor box的具體座標。faster-RCNN中k=9k=9，先確定一個base anchor，大小為16×1616×16，保持面積不變使其長寬比為(0.5,1,2)(0.5,1,2)，再對這三個不同長寬比的anchor放大(8,16,32)(8,16,32)三個尺度，一共得到9個anchors。
3. intermediate layer：作者程式碼中並沒有這個輸出256d特徵的中間層，直接通過1×11×1的卷積獲得2k2k scores和4k4k cordinates。作者在文中解釋為用全卷積方式替代全連線。
4. 2k2k scores：對於每個anchor，用了softmax layer的方式，會或得兩個置信度。作者在文中說也可以用sigmoid方式獲得一維是正例的置信度。
5. 4k4k cordinates：每個視窗的座標。這個座標並不是anchor的絕對座標，而是通過anchor迴歸groundtruth的位置所需要的偏差（會在下一節具體介紹）。

對於一幅大小為600×800600×800的影象，通過VGG之後，conv5_3的大小為38×5038×50，則總的anchor的個數為38×50×938×50×9。

3. 通過程式碼理解RPN
執行程式碼環境：Ubuntu14.04，MatlabR2016a。

1 準備
假設已經安裝好caffe所需要的依賴庫，faster-RCNN中有caffe的matlab介面，所以不需要安裝編譯caffe。以PASCAL VOC0712為例：

Step1: 下載faster-RCNN的原始碼並解壓。下載地址為https://github.com/ShaoqingRen/faster_rcnn。假設解壓之後路徑為$FASTERRCNN/。

Step2：下載VOC07和VOC12並解壓到任意資料夾（最好解壓到$FASTERRCNN/datasets/）。

Step3：下載網路模型檔案以及預訓練的VGG，解壓後拷貝到$FASTERRCNN/。下載地址為https://pan.baidu.com/s/1mgzSnI4。

Step4：在shell中進入$FASTERRCNN/並執行matlab。

2 faster-RCNN的檔案結構
經過上面的準備之後，matlab中faster-RCNN的檔案結構如下圖所示：

./bin：./functions/nms中非極大值抑制（NMS）的c程式碼mex之後的檔案
./datasets：VOC資料集的存放路徑
./experimenet：訓練或者測試的入口函式
./external：caffe的matlab介面。只需安裝好caffe的依賴庫，並不需要編譯caffe原始檔。
./fetch_date：下載資料集，預訓練模型等檔案的函式
./functions：訓練資料處理相關的函式
./imdb：將VOC資料讀入到imdb格式
./models：基網路(如VGG)的預訓練模型；fast-RCNN，RPN網路結構prototxt及求解相關的引數prototxt檔案
./utils：一些其它常用的函式
注意：./test是筆者在執行測試demo時臨時存放的一些測試影象，和faster-RCNN並沒有什麼關係。

3 訓練過程
採用VGG和VOC0712，其對應的訓練檔案為$FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m。由於只理解RPN部分，所以只需要詳細瞭解這個m檔案的前一小部分。

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%
% model
model = Model.VGG16_for_Faster_RCNN_VOC0712;
% cache base
cache_base_proposal = 'faster_rcnn_VOC0712_vgg_16layers';
cache_base_fast_rcnn = '';
% train/test data
dataset = [];
use_flipped = true;
dataset = Dataset.voc0712_trainval(dataset, 'train', use_flipped);
dataset = Dataset.voc2007_test(dataset, 'test', false);
%% -------------------- TRAIN --------------------
% conf
conf_proposal = proposal_config('image_means', model.mean_image, 'feat_stride', model.feat_stride);
conf_fast_rcnn = fast_rcnn_config('image_means', model.mean_image);
% set cache folder for each stage
model = Faster_RCNN_Train.set_cache_folder(cache_base_proposal, cache_base_fast_rcnn, model);
% generate anchors and pre-calculate output size of rpn network
[conf_proposal.anchors, conf_proposal.output_width_map, conf_proposal.output_height_map] ...
= proposal_prepare_anchors(conf_proposal, model.stage1_rpn.cache_name, model.stage1_rpn.test_net_def_file);
%% stage one proposal
fprintf('\n***************\nstage one proposal \n***************\n');
% train
model.stage1_rpn = Faster_RCNN_Train.do_proposal_train(conf_proposal, dataset, model.stage1_rpn, opts.do_val);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1引數配置階段
RPN一共配置了三個引數model，dataset，conf_proposal。conf_fast_rcnn是fast-RCNN的引數。

1 model引數：
指定了RPN和fast-RCNN兩個階段所需要的網路結構配置檔案prototxt的路徑。通過第一階段的RPN熟悉其具體過程。
指定了VGG pre-trained模型及影象均值的路徑。

引數model的配置：

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%
% model
model = Model.VGG16_for_Faster_RCNN_VOC0712;```
1
2
3
4
具體配置程式為下面的程式碼片段，只關注RPN第一階段相關的程式碼。首先指定了基網路(VGG)預訓練模型和影象均值檔案路徑；然後指定了RPN相關prototxt檔案路徑；最後設定了RPN測試引數。

% code from $FASTERRCNN/experiments/+Model/VGG16_for_faster_RCNN_VOC0712.m
%
% 基網路(VGG)預訓練模型和影象均值檔案路徑
model.mean_image = fullfile(pwd, 'models', 'pre_trained_models', 'vgg_16layers', 'mean_image');
model.pre_trained_net_file = fullfile(pwd, 'models', 'pre_trained_models', 'vgg_16layers', 'vgg16.caffemodel');
% Stride in input image pixels at the last conv layer
model.feat_stride = 16;
% RPN相關prototxt檔案路徑
%% stage 1 rpn, inited from pre-trained network
model.stage1_rpn.solver_def_file = fullfile(pwd, 'models', 'rpn_prototxts', 'vgg_16layers_conv3_1', 'solver_60k80k.prototxt');
model.stage1_rpn.test_net_def_file = fullfile(pwd, 'models', 'rpn_prototxts', 'vgg_16layers_conv3_1', 'test.prototxt');
model.stage1_rpn.init_net_file = model.pre_trained_net_file;
% RPN測試引數
% rpn test setting
model.stage1_rpn.nms.per_nms_topN = -1;
model.stage1_rpn.nms.nms_overlap_thres = 0.7;
model.stage1_rpn.nms.after_nms_topN = 2000;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2 dataset引數：
修改資料集路徑

如果VOC資料沒有解壓在$FASTERRCNN/datasets/資料夾中，更改 $ FASTERRCNN/experiments/+Dataset/private/voc2007_devkit.m 和$FASTERRCNN/experiments/+Dataset/private/voc2012_devkit.m 中的路徑為VOC資料集的解壓路徑。

% code from `$FASTERRCNN/experiments/+Dataset/private/voc2007_devkit.m`
%
function path = voc2007_devkit()
path = './datasets/VOCdevkit2007';
end
1
2
3
4
5
% code from `$FASTERRCNN/experiments/+Dataset/private/voc2012_devkit.m`
%
function path = voc2012_devkit()
path = './datasets/VOCdevkit2012';
end
1
2
3
4
5
dataset引數

引數dataset的配置：

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%
% train/test data
dataset = [];
use_flipped = true;
dataset = Dataset.voc0712_trainval(dataset, 'train', use_flipped);
dataset = Dataset.voc2007_test(dataset, 'test', false);
1
2
3
4
5
6
7
具體實現資料集讀取的檔案為 $FASTERRCNN/experiments/+Dataset/voc0712_trainval.m和$FASTERRCNN/experiments/+Dataset/voc0712_test。首先獲得資料集儲存路徑；然後將資料讀入到imdb和roidb檔案。

% code from $FASTERRCNN/experiments/+Dataset/voc0712_trainval.m
%
% 獲得資料集儲存路徑
devkit2007 = voc2007_devkit();
devkit2012 = voc2012_devkit();
% 將資料讀入到imdb和roidb檔案
switch usage
case {'train'}
dataset.imdb_train = { imdb_from_voc(devkit2007, 'trainval', '2007', use_flip), ...
imdb_from_voc(devkit2012, 'trainval', '2012', use_flip)};
dataset.roidb_train = cellfun(@(x) x.roidb_func(x), dataset.imdb_train, 'UniformOutput', false);
case {'test'}
error('only supports one source test currently');
otherwise
error('usage = ''train'' or ''test''');
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
imdb檔案是一個matlab的表結構，表的每一行是一幅影象，分別包含如下資訊：影象的路徑，編號，大小，groundtruth（位置及類標）等。

3 conf_proposal引數：
只關注RPN的conf_proposal

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%

% conf
conf_proposal = proposal_config('image_means', model.mean_image, 'feat_stride', model.feat_stride);
1
2
3
4
5
RPN所需要的引數。其中值得注意的引數有
batch_size：[256]每幅影象中篩選使用的bg樣本和fg樣本的總個數
fg_fraction：[0.5]batch_size中fg樣本的比例，如果fg樣本個數不足，則新增bg樣本
drop_boxes_runoff_image：[1]在訓練階段是否去掉超出影象邊界的anchors
bg_thresh_hi：[0.3]被看做反例樣本的anchor與groundtruth的最大IoU
bg_thresh_lo：[0]被看做反例樣本的anchor與groundtruth的最小IoU
fg_thresh：[0.7]被看做正例樣本的anchor與groundtruth的最小IoU
ims_per_batch：[1]訓練時每次輸入的影象個數，當前只支援每次輸入一幅影象
scale：[600]短邊縮放後最小值
max_size：[1000]長邊縮放後最大值
feat_stride：[16]VGG中conv5_3相比於輸入影象縮小了16倍，也就是相鄰兩個點之間的stride=16
anchors：不同長寬比和尺度的9個基本anchors
output_width_map：輸入影象的寬度和conv5_3寬度的對應關係
output_height_map：輸入影象的高度和conv5_3高度的對應關係
bg_weight：[1]計算損失時每個反例樣本的權值，正例樣本權值全為1
image_means：影象均值

具體配置檔案為：

% code from $FASTERRCNN/functions/rpn/proposal_config.m
%

function conf = proposal_config(varargin)
% conf = proposal_config(varargin)
% --------------------------------------------------------
% Faster R-CNN
% Copyright (c) 2015, Shaoqing Ren
% Licensed under The MIT License [see LICENSE for details]
% --------------------------------------------------------

ip = inputParser;

%% training
ip.addParamValue('use_gpu', gpuDeviceCount > 0, ...
@islogical);

% whether drop the anchors that has edges outside of the image boundary
ip.addParamValue('drop_boxes_runoff_image', ...
true, @islogical);

% Image scales -- the short edge of input image
ip.addParamValue('scales', 600, @ismatrix);
% Max pixel size of a scaled input image
ip.addParamValue('max_size', 1000, @isscalar);
% Images per batch, only supports ims_per_batch = 1 currently
ip.addParamValue('ims_per_batch', 1, @isscalar);
% Minibatch size
ip.addParamValue('batch_size', 256, @isscalar);
% Fraction of minibatch that is foreground labeled (class > 0)
ip.addParamValue('fg_fraction', 0.5, @isscalar);
% weight of background samples, when weight of foreground samples is
% 1.0
ip.addParamValue('bg_weight', 1.0, @isscalar);
% Overlap threshold for a ROI to be considered foreground (if >= fg_thresh)
ip.addParamValue('fg_thresh', 0.7, @isscalar);
% Overlap threshold for a ROI to be considered background (class = 0 if
% overlap in [bg_thresh_lo, bg_thresh_hi))
ip.addParamValue('bg_thresh_hi', 0.3, @isscalar);
ip.addParamValue('bg_thresh_lo', 0, @isscalar);
% mean image, in RGB order
ip.addParamValue('image_means', 128, @ismatrix);
% Use horizontally-flipped images during training?
ip.addParamValue('use_flipped', true, @islogical);
% Stride in input image pixels at ROI pooling level (network specific)
% 16 is true for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16
ip.addParamValue('feat_stride', 16, @isscalar);
% train proposal target only to labled ground-truths or also include
% other proposal results (selective search, etc.)
ip.addParamValue('target_only_gt', true, @islogical);

% random seed
ip.addParamValue('rng_seed', 6, @isscalar);

%% testing
ip.addParamValue('test_scales', 600, @isscalar);
ip.addParamValue('test_max_size', 1000, @isscalar);
ip.addParamValue('test_nms', 0.3, @isscalar);
ip.addParamValue('test_binary', false, @islogical);
ip.addParamValue('test_min_box_size',16, @isscalar);
ip.addParamValue('test_drop_boxes_runoff_image', ...
false, @islogical);

ip.parse(varargin{:});
conf = ip.Results;

assert(conf.ims_per_batch == 1, 'currently rpn only supports ims_per_batch == 1');

% if image_means is a file, load it
if ischar(conf.image_means)
s = load(conf.image_means);
s_fieldnames = fieldnames(s);
assert(length(s_fieldnames) == 1);
conf.image_means = s.(s_fieldnames{1});
end
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
2 產生anchor
% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%

% generate anchors and pre-calculate output size of rpn network
[conf_proposal.anchors, conf_proposal.output_width_map, conf_proposal.output_height_map] ...
= proposal_prepare_anchors(conf_proposal, model.stage1_rpn.cache_name, model.stage1_rpn.test_net_def_file);
1
2
3
4
5
6
proposal_prepare_anchors函式分為兩部分。首先產生輸入影象大小和conv5_3大小的對應關係map；然後產生9個基本anchors。最後將output_width_map，output_height_map以及anchors存入conf_proposal引數中。

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%

function [anchors, output_width_map, output_height_map] = proposal_prepare_anchors(conf, cache_name, test_net_def_file)
%產生輸入影象大小和conv5_3大小的對應關係
[output_width_map, output_height_map] ...
= proposal_calc_output_size(conf, test_net_def_file);
%產生9個基本anchors
anchors = proposal_generate_anchors(cache_name, ...
'scales', 2.^[3:5]);
end
1
2
3
4
5
6
7
8
9
10
11
1 輸入影象大小和conv5_3大小的對應關係
首先初始化RPN的測試網路；然後產生不同長寬的全零影象並進行前向傳播；記錄每個輸入影象大小對應的conv5_3大小；重置caffe。

% code from $FASTERRCNN/functions/rpn/proposal_calc_output_size.m
%

% 初始化RPN的測試網路
caffe_net = caffe.Net(test_net_def_file, 'test');

% set gpu/cpu
if conf.use_gpu
caffe.set_mode_gpu();
else
caffe.set_mode_cpu();
end

% 產生不同長寬的全零影象並進行前向傳播
input = 100:conf.max_size;
output_w = nan(size(input));
output_h = nan(size(input));
for i = 1:length(input)
s = input(i);
im_blob = single(zeros(s, s, 3, 1));
net_inputs = {im_blob};

% Reshape net's input blobs
caffe_net.reshape_as_input(net_inputs);
caffe_net.forward(net_inputs);

% 記錄每個輸入影象大小對應的conv5_3大小
cls_score = caffe_net.blobs('proposal_cls_score').get_data();
output_w(i) = size(cls_score, 1);
output_h(i) = size(cls_score, 2);
end

output_width_map = containers.Map(input, output_w);
output_height_map = containers.Map(input, output_h);

% 重置caffe
caffe.reset_all();
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
2 生成9個基準anchors
設定最基準的anchor大小為16×1616×16；保持面積不變，利用該m檔案中ratio_jitter生成三個長寬比(0.5,1,2)(0.5,1,2)的anchors，如下圖所示；通過該m檔案中scale_jitter將不同長寬比的anchors放大到三個尺度(8,16,32)(8,16,32)。一共生成9個anchors。

% code from $FASTERRCNN/functions/rpn/proposal_generate_anchors.m
%

%% inputs
ip = inputParser;
ip.addRequired('cache_name', @isstr);

% the size of the base anchor
ip.addParamValue('base_size', 16, @isscalar);
% ratio list of anchors
ip.addParamValue('ratios', [0.5, 1, 2], @ismatrix);
% scale list of anchors
ip.addParamValue('scales', 2.^[3:5], @ismatrix);
ip.addParamValue('ignore_cache', false, @islogical);
ip.parse(cache_name, varargin{:});
opts = ip.Results;

%%
if ~opts.ignore_cache
anchor_cache_dir = fullfile(pwd, 'output', 'rpn_cachedir', cache_name);
mkdir_if_missing(anchor_cache_dir);
anchor_cache_file = fullfile(anchor_cache_dir, 'anchors');
end
try
ld = load(anchor_cache_file);
anchors = ld.anchors;
catch
% 設定最基準的anchor大小為$16\times16$
base_anchor = [1, 1, opts.base_size, opts.base_size];
% 保持面積不變，生成不同長寬比的anchors
ratio_anchors = ratio_jitter(base_anchor, opts.ratios);
% 在不同長寬比anchors的基礎上進行尺度縮放
anchors = cellfun(@(x) scale_jitter(x, opts.scales), num2cell(ratio_anchors, 2), 'UniformOutput', false);
anchors = cat(1, anchors{:});
if ~opts.ignore_cache
save(anchor_cache_file, 'anchors');
end
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
3 訓練階段
所有引數設定完成後開始訓練。

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%

%% stage one proposal
fprintf('\n***************\nstage one proposal \n***************\n');
% train
model.stage1_rpn = Faster_RCNN_Train.do_proposal_train(conf_proposal, dataset, model.stage1_rpn, opts.do_val);
1
2
3
4
5
6
7
do_proposal_train直接呼叫$FASTERRCNN/functions/rpn/proposal_train.m檔案。
根據作者註釋的流程，$FASTERRCNN/functions/rpn/proposal_train.m主要分為init， making tran/val data和Training三個階段

1 init，初始化
初始化中主要設定快取檔案路徑，讀入caffe求解引數，讀入caffe模型結構，讀入預訓練模型，初始化日誌檔案，設定GPU模式。

% code from `$FASTERRCNN/functions/rpn/proposal_train.m`
%

%% init
% init caffe solver
imdbs_name = cell2mat(cellfun(@(x) x.name, imdb_train, 'UniformOutput', false));
cache_dir = fullfile(pwd, 'output', 'rpn_cachedir', opts.cache_name, imdbs_name);
mkdir_if_missing(cache_dir);
caffe_log_file_base = fullfile(cache_dir, 'caffe_log');
caffe.init_log(caffe_log_file_base);
caffe_solver = caffe.Solver(opts.solver_def_file);
caffe_solver.net.copy_from(opts.net_file);

% init log
timestamp = datestr(datevec(now()), 'yyyymmdd_HHMMSS');
mkdir_if_missing(fullfile(cache_dir, 'log'));
log_file = fullfile(cache_dir, 'log', ['train_', timestamp, '.txt']);
diary(log_file);

% set random seed
prev_rng = seed_rand(conf.rng_seed);
caffe.set_random_seed(conf.rng_seed);

% set gpu/cpu
if conf.use_gpu
caffe.set_mode_gpu();
else
caffe.set_mode_cpu();
end

disp('conf:');
disp(conf);
disp('opts:');
disp(opts);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
2 making tran/val data，將bbs的資料轉換為regression的資料
% code from `$FASTERRCNN/functions/rpn/proposal_train.m`
%

%% making tran/val data
fprintf('Preparing training data...');
[image_roidb_train, bbox_means, bbox_stds]...
= proposal_prepare_image_roidb(conf, opts.imdb_train, opts.roidb_train);
fprintf('Done.\n');

if opts.do_val
fprintf('Preparing validation data...');
[image_roidb_val]...
= proposal_prepare_image_roidb(conf, opts.imdb_val, opts.roidb_val, bbox_means, bbox_stds);
fprintf('Done.\n');
1
2
3
4
5
6
7
8
9
10
11
12
13
14
proposal_prepare_image_roidb.m從imdb以及roidb中讀入影象資訊後，實現了：影象中bbx的groundtruth資料由[x1,y1,x2,y2]轉換為[dx,dy,dw,dh]，由faster-RCNN論文中的公式(2)實現；然後對bg和fg樣本進行篩選；最後計算轉換後的[dx,dy,dw,dh]均值和方差。

Step1: 從imdb以及roidb中讀入影象資訊

% code from `$FASTERRCNN/functions/rpn/proposal_prepare_image_roidb.m`
%

imdbs = imdbs(:);
roidbs = roidbs(:);

if conf.target_only_gt
image_roidb = ...
cellfun(@(x, y) ... // @(imdbs, roidbs)
arrayfun(@(z) ... //@([1:length(x.image_ids)])
struct('image_path', x.image_at(z), 'image_id', x.image_ids{z}, 'im_size', x.sizes(z, :), 'imdb_name', x.name, 'num_classes', x.num_classes, ...
'boxes', y.rois(z).boxes(y.rois(z).gt, :), 'class', y.rois(z).class(y.rois(z).gt, :), 'image', [], 'bbox_targets', []), ...
[1:length(x.image_ids)]', 'UniformOutput', true),...
imdbs, roidbs, 'UniformOutput', false);
else
image_roidb = ...
cellfun(@(x, y) ... // @(imdbs, roidbs)
arrayfun(@(z) ... //@([1:length(x.image_ids)])
struct('image_path', x.image_at(z), 'image_id', x.image_ids{z}, 'im_size', x.sizes(z, :), 'imdb_name', x.name, ...
'boxes', y.rois(z).boxes, 'class', y.rois(z).class, 'image', [], 'bbox_targets', []), ...
[1:length(x.image_ids)]', 'UniformOutput', true),...
imdbs, roidbs, 'UniformOutput', false);
end

image_roidb = cat(1, image_roidb{:});
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Step2: bbx的groundtruth轉換

% code from `$FASTERRCNN/functions/rpn/proposal_prepare_image_roidb.m`
%
% enhance roidb to contain bounding-box regression targets
[image_roidb, bbox_means, bbox_stds] = append_bbox_regression_targets(conf, image_roidb, bbox_means, bbox_stds);
1
2
3
4
proposal_prepare_image_roidb.m，詳細步驟為：
- 讀入影象資訊：將影象資訊讀入到image_roidb中。
- groundtruth資料轉換：proposal_prepare_image_roidb.m中的append_bbox_regression_targets實現
- 獲得所有anchors：通過proposal_locate_anchors.m獲得影象的所有anchors以及影象需要縮放的比例
- 影象縮放比例：通過scale和max_size獲得影象的縮放比例並記錄縮放後圖像大小
影象的最短邊最小值為scale,最長邊最大值為max_size

- **conv5_3特徵層大小：**通過查表法獲得縮放後圖像對應的conv5_3的大小（output_width_map，output_height_map）
- **網格化：**按照`feat_stride`將conv5_3的大小打成網格
- **所有anchors:**在網格每個節點上放入9個基本`anchors`,並獲得其座標。
- **挑選樣本：**`proposal_prepare_image_roidb.m`檔案中的`compute_targets`實現正例樣本和反例樣本的選取
- **計算overlap**：所有anchors存入變數`ex_rois`，計算每個anchor和每個groundtruth的重疊率(IoU)
- **去掉超出範圍的anchor**：將超出範圍的anchor和groundtruth的重疊率置0.
- **篩選正例樣本**：IoU最大的和IoU大於`fg_thresh`的anchor作為正例樣本
- **篩選反例樣本**：IoU介於`bg_thresh_hi`和`bg_thresh_lo`之間的作為反例樣本
- **計算迴歸量**：通過文章中公式(2)計算每個正例樣本的迴歸量`dx`，`dy`，`dw`，`dh`
- **新的groundtruth**：將正例樣本的迴歸量作為正例樣本的groundtruth（類標1），反例樣本的迴歸量均設為0（類標-1）。
- **計算均值方差**：計所有正例樣本的迴歸量的均值和方差，並且標準化（減去均值，除以方差）
1
2
3
4
5
6
7
8
9
10
11
3 Training，訓練
Step1: 打亂訓練資料順序
proposal_train.m中的generate_random_minibatch函式實現對訓練資料的打亂，並返回打亂後的第一幅影象的標號sub_db_inds。

Step2: 準備一個訓練資料
proposal_generate_minibatch.m實現。
- 正反例樣本選取及權重設定：proposal_generate_minibatch.m中的sample_rois選取樣本並且設定權重
- fg_inds：正例樣本序號，如果不到batch_size的fg_fraction倍，則用反例樣本補足。
- bg_inds：反例樣本序號，反例樣本一般都比較多，需要進行隨機選取。
- label：對每個正例樣本label置1，反例樣本label置0.
- label_weights：樣本類別損失的權重。正例樣本置1，反例樣本置bg_weight。
- bbox_targets：進行資料轉換後的正反例樣本視窗位置
- bbox_loss_weights：樣本位置損失的權重。正例為1，反例為0

整合RPN輸入blob
**RPN輸入的im_blob：**im_blob
**RPN輸入的labels_blob：**labels_blob
**RPN輸入的label_weights_blob：**label_weights_blob
**RPN輸入的bbox_targets_blob：**bbox_targets_blob
**RPN輸入的bbox_loss_blob：**bbox_loss_blob
Step3: 迭代
---------------------
作者：happykew
來源：CSDN
原文：https://blog.csdn.net/happyflyy/article/details/54917514
版權宣告：本文為博主原創文章，轉載請附上博文連結！

[caffe筆記005]：通過程式碼理解faster-RCNN中的RPN

[caffe筆記005]：通過程式碼理解faster-RCNN中的RPN

[caffe筆記005]：通過程式碼理解faster-RCNN中的RPN

對faster rcnn 中rpn層的理解

C++筆記005：用面向過程和面向對象方法求解圓形面積

通過程式碼理解 C++ 繼承

論文筆記8：通過形式概念分析建立教學模式

工作筆記3：發票程式碼規則

OC中UITableView之自定義cell的使用（2）：通過程式碼建立

Git學習筆記3：通過git log 檢視版本演變歷史

LDA主題模型學習筆記5：C原始碼理解

深度學習Caffe實戰筆記（21）Windows平臺 Faster-RCNN 訓練好的模型測試資料

Java學習筆記11：過載的理解

深度學習Caffe實戰筆記（20）Windows平臺 Faster-RCNN 訓練自己的資料集

[caffe筆記002]：Caffe原始碼c++除錯

Java學習筆記38：通過Spring Bean 注入static變數，來設計一套適合測試，開發，生產環境的配置項

[caffe筆記001]：caffe依賴庫安裝（非root）

[caffe筆記009]：編譯caffe官方github的windows版

深度學習Caffe實戰筆記（18）Windows平臺 Faster-RCNN 環境配置

Spring學習筆記三：通過註解配置Bean

樹莓派筆記12：通過SPI操作OLED顯示屏

MapReduce剖析筆記之一：從WordCount理解MapReduce的幾個階段

[caffe筆記005]：通過程式碼理解faster-RCNN中的RPN

[caffe筆記005]：通過程式碼理解faster-RCNN中的RPN

相關推薦