1. 程式人生 > >RefineDet演算法原始碼(二)網路結構

RefineDet演算法原始碼(二)網路結構

關於RefineDet演算法內容可以先看看部落格:RefineDet論文筆記

RefineDet演算法是SSD演算法的升級版本,所以大部分的程式碼也是基於SSD的開原始碼來修改的。SSD開原始碼參考連結:https://github.com/weiliu89/caffe/tree/ssdRefineDet主要包含anchor refinement module (ARM) 、object detection module (ODM)、transfer connection block (TCB)3個部分,ARM部分可以直接用SSD程式碼,只不過將分類支路的類別數由object數量+1修改成2,類似RPN網路,目的是得到更好的初始bbox。ODM部分也可以基於SSD程式碼做修改,主要是原本採用的default box用ARM生成的bbox代替,剩下的分類和迴歸支路與SSD一樣。TCB部分則通過一些卷積層和反捲積層即可實現。

在部落格:RefineDet演算法原始碼 (一)訓練指令碼中介紹了訓練RefineDet演算法的程式碼,其中包含巨集觀上的網路結構構建,並未涉及細節內容。因此這篇部落格介紹RefineDet演算法的具體網路結構構造細節,程式碼所在路徑:~RefineDet/python/caffe/model_libs.py指令碼的CreateRefineDetHead函式。

'''
CreateRefineDetHead函式是本文關於網路結構構造的重點,這部分程式碼也是在原來SSD的CreateMultiBoxHead函式
基礎上修改得到的,可以看作是將原來SSD的CreateMultiBoxHead函式內容實現了兩遍,一遍用來實現ARM部分,
另一邊用來實現ORM部分。from_layers和from_layers2是兩個重點輸入,
分別對應論文中Figure1的ARM和OBM兩部分輸出。因此這兩遍實現除了輸入不同外,另一個不同是ARM部分
是類似RPN網路的bbox迴歸和二分類,而ORM部分是類似SSD檢測網路的bbox迴歸和object分類。
'''
def CreateRefineDetHead(net, data_layer="data", num_classes=[], from_layers=[], from_layers2=[], normalizations=[], use_batchnorm=True, lr_mult=1, min_sizes=[], max_sizes=[], prior_variance = [0.1],aspect_ratios=[], steps=[], img_height=0, img_width=0, share_location=True, flip=True, clip=True, offset=0.5
, inter_layer_depth=[], kernel_size=1, pad=0, conf_postfix='', loc_postfix='', **bn_param)
:
assert num_classes, "must provide num_classes" assert num_classes > 0, "num_classes must be positive number" if normalizations: assert len(from_layers) == len(normalizations), "from_layers and normalizations should have same length" assert len(from_layers) == len(min_sizes), "from_layers and min_sizes should have same length" if max_sizes: assert len(from_layers) == len(max_sizes), "from_layers and max_sizes should have same length" if aspect_ratios: assert len(from_layers) == len(aspect_ratios), "from_layers and aspect_ratios should have same length" if steps: assert len(from_layers) == len(steps), "from_layers and steps should have same length" net_layers = net.keys() assert data_layer in net_layers, "data_layer is not in net's layers" if inter_layer_depth: assert len(from_layers) == len(inter_layer_depth), "from_layers and inter_layer_depth should have same length" # 接下來的程式碼分為兩部分,一部分是Anchor Refinement Module(ARM),另一部分 # 是Object Detection Module(ODM),首先看看Anchor Refinement Module(ARM)部分內容。 prefix = 'arm' num_classes_rpn = 2 num = len(from_layers) priorbox_layers = [] loc_layers = [] conf_layers = [] # 這個迴圈就是作用於每個融合層,文章中預設融合層有4個。 for i in range(0, num): from_layer = from_layers[i] # Get the normalize value. if normalizations: if normalizations[i] != -1: norm_name = "{}_norm".format(from_layer) net[norm_name] = L.Normalize(net[from_layer], scale_filler=dict(type="constant", value=normalizations[i]), across_spatial=False, channel_shared=False) from_layer = norm_name # Add intermediate layers. # 這部分預設是執行的,而且inter_layer_depth=[1,1,1,1],也就是每個融合層都接一個residual block, # 這種在分類和迴歸支路之前再新增層的操作在很多object detection演算法中都有。 if inter_layer_depth: if inter_layer_depth[i] > 0: inter_name = "{}_inter".format(from_layer) ResBody(net, from_layer, inter_name, out2a=256, out2b=256, out2c=1024, stride=1, use_branch1=True) # ConvBNLayer(net, from_layer, inter_name, use_bn=use_batchnorm, use_relu=True, lr_mult=lr_mult, # num_output=inter_layer_depth[i], kernel_size=3, pad=1, stride=1, **bn_param) from_layer = "res{}".format(inter_name) # Estimate number of priors per location given provided parameters. min_size = min_sizes[i] if type(min_size) is not list: min_size = [min_size] aspect_ratio = [] if len(aspect_ratios) > i: aspect_ratio = aspect_ratios[i] if type(aspect_ratio) is not list: aspect_ratio = [aspect_ratio] max_size = [] if len(max_sizes) > i: max_size = max_sizes[i] if type(max_size) is not list: max_size = [max_size] if max_size: assert len(max_size) == len(min_size), "max_size and min_size should have same length." if max_size: num_priors_per_location = (2 + len(aspect_ratio)) * len(min_size) else: num_priors_per_location = (1 + len(aspect_ratio)) * len(min_size) if flip: num_priors_per_location += len(aspect_ratio) * len(min_size) step = [] if len(steps) > i: step = steps[i] # Create location prediction layer. # 這部分程式碼是建立bbox的座標迴歸層,num_priors_per_location是feature map層的每個點生成的bbox的數量。 # share_location預設是True,所以不執行條件語句。得到的結果就會插入loc_layers列表中, # 這樣經過4個融合層後,loc_layers就包含4個融合層的bbox座標迴歸結果。 name = "{}_mbox_loc{}".format(from_layer, loc_postfix) num_loc_output = num_priors_per_location * 4 if not share_location: num_loc_output *= num_classes_rpn ConvBNLayer(net, from_layer, name, use_bn=use_batchnorm, use_relu=False, lr_mult=lr_mult, num_output=num_loc_output, kernel_size=kernel_size, pad=pad, stride=1, **bn_param) permute_name = "{}_perm".format(name) net[permute_name] = L.Permute(net[name], order=[0, 2, 3, 1]) flatten_name = "{}_flat".format(name) net[flatten_name] = L.Flatten(net[permute_name], axis=1) loc_layers.append(net[flatten_name]) # Create confidence prediction layer. # 這部分程式碼是建立bbox的分類層,這裡num_conf_output = num_priors_per_location * num_classes_rpn, # 要注意的是num_classes_rpn設定為2,所以這裡是對每個bbox做二分類,也就是前景(foreground)和 # 背景(background)的二分類。因此這裡的分類支路就和RPN網路一樣,得到的結果會插入conf_layers列表中, # 這樣經過4個融合層後,conf_layers就包含4個融合層的二分類結果了。 name = "{}_mbox_conf{}".format(from_layer, conf_postfix) num_conf_output = num_priors_per_location * num_classes_rpn ConvBNLayer(net, from_layer, name, use_bn=use_batchnorm, use_relu=False, lr_mult=lr_mult,num_output=num_conf_output, kernel_size=kernel_size, pad=pad, stride=1, **bn_param) permute_name = "{}_perm".format(name) net[permute_name] = L.Permute(net[name], order=[0, 2, 3, 1]) flatten_name = "{}_flat".format(name) net[flatten_name] = L.Flatten(net[permute_name], axis=1) conf_layers.append(net[flatten_name]) # Create prior generation layer. ''' 這一部分程式碼是生成anchor(或者叫priorbox),這些anchor和RPN網路的anchor一樣,生成後就固定不變了, 而前面所說的bbox是指預測的框,跟這些anchor不是一回事。那麼生成這些anchor做什麼呢? 這是為了計算損失用。不管是RefineDet、SSD還是Faster RCNN,對座標的迴歸損失計算都一樣, 計算的是預測得到的offset要儘可能和(ground truth與anchor之間)的offset接近。 所以計算ground truth和anchor之間的offset的時候就需要用到這裡計算得到的輸出(anchor的座標)。 ''' name = "{}_mbox_priorbox".format(from_layer) net[name] = L.PriorBox(net[from_layer], net[data_layer], min_size=min_size, clip=clip, variance=prior_variance, offset=offset) if max_size: net.update(name, {'max_size': max_size}) if aspect_ratio: net.update(name, {'aspect_ratio': aspect_ratio, 'flip': flip}) if step: net.update(name, {'step': step}) if img_height != 0 and img_width != 0: if img_height == img_width: net.update(name, {'img_size': img_height}) else: net.update(name, {'img_h': img_height, 'img_w': img_width}) priorbox_layers.append(net[name]) # Concatenate priorbox, loc, and conf layers. # 接下來這部分是對不同層的輸出做融合。 mbox_layers = [] name = '{}{}'.format(prefix, "_loc") net[name] = L.Concat(*loc_layers, axis=1) mbox_layers.append(net[name]) name = '{}{}'.format(prefix, "_conf") net[name] = L.Concat(*conf_layers, axis=1) mbox_layers.append(net[name]) name = '{}{}'.format(prefix, "_priorbox") net[name] = L.Concat(*priorbox_layers, axis=2) mbox_layers.append(net[name]) # 接下來這部分是Object Detection Module(ODM),大部分和ARM相同的程式碼這裡不再重複介紹,主要介紹不同點。 prefix = 'odm' num = len(from_layers2) loc_layers = [] conf_layers = [] for i in range(0, num): from_layer = from_layers2[i] # Get the normalize value. if normalizations: if normalizations[i] != -1: norm_name = "{}_norm".format(from_layer) net[norm_name] = L.Normalize(net[from_layer], scale_filler=dict(type="constant", value=normalizations[i]), across_spatial=False, channel_shared=False) from_layer = norm_name # Add intermediate layers. if inter_layer_depth: if inter_layer_depth[i] > 0: inter_name = "{}_inter".format(from_layer) ResBody(net, from_layer, inter_name, out2a=256, out2b=256, out2c=1024, stride=1, use_branch1=True) # ConvBNLayer(net, from_layer, inter_name, use_bn=use_batchnorm, use_relu=True, lr_mult=lr_mult, # num_output=inter_layer_depth[i], kernel_size=3, pad=1, stride=1, **bn_param) # from_layer = inter_name from_layer = "res{}".format(inter_name) # Estimate number of priors per location given provided parameters. min_size = min_sizes[i] if type(min_size) is not list: min_size = [min_size] aspect_ratio = [] if len(aspect_ratios) > i: aspect_ratio = aspect_ratios[i] if type(aspect_ratio) is not list: aspect_ratio = [aspect_ratio] max_size = [] if len(max_sizes) > i: max_size = max_sizes[i] if type(max_size) is not list: max_size = [max_size] if max_size: assert len(max_size) == len(min_size), "max_size and min_size should have same length." if max_size: num_priors_per_location = (2 + len(aspect_ratio)) * len(min_size) else: num_priors_per_location = (1 + len(aspect_ratio)) * len(min_size) if flip: num_priors_per_location += len(aspect_ratio) * len(min_size) # Create location prediction layer. name = "{}_mbox_loc{}".format(from_layer, loc_postfix) num_loc_output = num_priors_per_location * 4 if not share_location: num_loc_output *= num_classes ConvBNLayer(net, from_layer, name, use_bn=use_batchnorm, use_relu=False, lr_mult=lr_mult, num_output=num_loc_output, kernel_size=kernel_size, pad=pad, stride=1, **bn_param) permute_name = "{}_perm".format(name) net[permute_name] = L.Permute(net[name], order=[0, 2, 3, 1]) flatten_name = "{}_flat".format(name) net[flatten_name] = L.Flatten(net[permute_name], axis=1) loc_layers.append(net[flatten_name]) # Create confidence prediction layer. # 這裡的num_conf_output = num_priors_per_location * num_classes, # num_classes是所有object的數量+背景。因此這裡的分類支路就和SSD中的一樣。 name = "{}_mbox_conf{}".format(from_layer, conf_postfix) num_conf_output = num_priors_per_location * num_classes ConvBNLayer(net, from_layer, name, use_bn=use_batchnorm, use_relu=False, lr_mult=lr_mult, num_output=num_conf_output, kernel_size=kernel_size, pad=pad, stride=1, **bn_param) permute_name = "{}_perm".format(name) net[permute_name] = L.Permute(net[name], order=[0, 2, 3, 1]) flatten_name = "{}_flat".format(name) net[flatten_name] = L.Flatten(net[permute_name], axis=1) conf_layers.append(net[flatten_name]) # Concatenate priorbox, loc, and conf layers. # 最後在返回列表中添加了bbox的分類輸出和迴歸輸出。 name = '{}{}'.format(prefix, "_loc") net[name] = L.Concat(*loc_layers, axis=1) mbox_layers.append(net[name]) name = '{}{}'.format(prefix, "_conf") net[name] = L.Concat(*conf_layers, axis=1) mbox_layers.append(net[name]) return mbox_layers