實現Faster R-CNN的keras程式碼理解(三)

阿新 • • 發佈：2018-12-18

計算RPN網路的輸出值

函式為datagenerators.py中的calc_rpn():

輸入：配置檔案，增強後的圖片的資訊，圖片原寬，圖片原高，resize後的寬，resize後的高，網路最後輸出的卷積層的大小；

輸出：y_rpn_cls前半段是否包含物體，y_rpn_regr迴歸的梯度

流程如圖所示：

程式碼詳解：

def calc_rpn(C, img_data, width, height, resized_width, resized_height, img_length_calc_function):
	#downsacle預設是16，共享卷積層壓縮影象的倍數
	downscale = float(C.rpn_stride)
	anchor_sizes = C.anchor_box_scales
	anchor_ratios = C.anchor_box_ratios
	num_anchors = len(anchor_sizes) * len(anchor_ratios)	
	#計算最後一個卷積層輸出的feature map的大小
	# calculate the output map size based on the network architecture

	(output_width, output_height) = img_length_calc_function(resized_width, resized_height)

	n_anchratios = len(anchor_ratios)
	
	# initialise empty output objectives
	y_rpn_overlap = np.zeros((output_height, output_width, num_anchors))#anchor是否與gta有overlap，且其IOU>0.7，overlap則為1反之則為0
	y_is_box_valid = np.zeros((output_height, output_width, num_anchors))#anchor是否有效，有效則為1，反之為0
	y_rpn_regr = np.zeros((output_height, output_width, num_anchors * 4))#返回的是與gta有overlap的anchor的梯度

	num_bboxes = len(img_data['bboxes'])

	num_anchors_for_bbox = np.zeros(num_bboxes).astype(int)#返回與每一個gta有overlap的anchor的數量
	best_anchor_for_bbox = -1*np.ones((num_bboxes, 4)).astype(int)#返回與每一個gta置信度最高的anchor的位置
	best_iou_for_bbox = np.zeros(num_bboxes).astype(np.float32)#返回與每一個gta置信度最高的anchor的分數
	best_x_for_bbox = np.zeros((num_bboxes, 4)).astype(int)#返回與每一個gta置信度最高的anchor的座標
	best_dx_for_bbox = np.zeros((num_bboxes, 4)).astype(np.float32)#返回與每一個gta置信度最高的anchor的梯度
	
	# get the GT box coordinates, and resize to account for image resizing
	#對resized後的圖片的bbox進行resize
	gta = np.zeros((num_bboxes, 4))
	for bbox_num, bbox in enumerate(img_data['bboxes']):
		# get the GT box coordinates, and resize to account for image resizing
		gta[bbox_num, 0] = bbox['x1'] * (resized_width / float(width))
		gta[bbox_num, 1] = bbox['x2'] * (resized_width / float(width))
		gta[bbox_num, 2] = bbox['y1'] * (resized_height / float(height))
		gta[bbox_num, 3] = bbox['y2'] * (resized_height / float(height))
	
	# rpn ground truth
	##迴圈每一個框，在feature map上的位置，並進行操作。返回y_rpn_cls與y_rpn_regr
	for anchor_size_idx in range(len(anchor_sizes)):
		for anchor_ratio_idx in range(n_anchratios):
			anchor_x = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][0]
			anchor_y = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][1]	
			
			for ix in range(output_width):					
				# x-coordinates of the current anchor box	
				x1_anc = downscale * (ix + 0.5) - anchor_x / 2
				x2_anc = downscale * (ix + 0.5) + anchor_x / 2	
				
				# ignore boxes that go across image boundaries					
				if x1_anc < 0 or x2_anc > resized_width:
					continue
					
				for jy in range(output_height):

					# y-coordinates of the current anchor box
					y1_anc = downscale * (jy + 0.5) - anchor_y / 2
					y2_anc = downscale * (jy + 0.5) + anchor_y / 2

					# ignore boxes that go across image boundaries
					if y1_anc < 0 or y2_anc > resized_height:
						continue

					# bbox_type indicates whether an anchor should be a target 
					bbox_type = 'neg'

					# this is the best IOU for the (x,y) coord and the current anchor
					# note that this is different from the best IOU for a GT bbox
					best_iou_for_loc = 0.0
					#對每一個anchor找與其置信度最大的gta
					for bbox_num in range(num_bboxes):
						
						# get IOU of the current GT box and the current anchor box
						curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1_anc, y1_anc, x2_anc, y2_anc])
						# calculate the regression targets if they will be needed
						##找到與gta置信度最高的anchor的梯度
						if curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap:
							cx = (gta[bbox_num, 0] + gta[bbox_num, 1]) / 2.0
							cy = (gta[bbox_num, 2] + gta[bbox_num, 3]) / 2.0
							cxa = (x1_anc + x2_anc)/2.0
							cya = (y1_anc + y2_anc)/2.0

							tx = (cx - cxa) / (x2_anc - x1_anc)
							ty = (cy - cya) / (y2_anc - y1_anc)
							tw = np.log((gta[bbox_num, 1] - gta[bbox_num, 0]) / (x2_anc - x1_anc))
							th = np.log((gta[bbox_num, 3] - gta[bbox_num, 2]) / (y2_anc - y1_anc))
						
						if img_data['bboxes'][bbox_num]['class'] != 'bg':
							##找到與gta置信度最高的anchor的梯度
							# all GT boxes should be mapped to an anchor box, so we keep track of which anchor box was best
							if curr_iou > best_iou_for_bbox[bbox_num]:
								best_anchor_for_bbox[bbox_num] = [jy, ix, anchor_ratio_idx, anchor_size_idx]
								best_iou_for_bbox[bbox_num] = curr_iou
								best_x_for_bbox[bbox_num,:] = [x1_anc, x2_anc, y1_anc, y2_anc]
								best_dx_for_bbox[bbox_num,:] = [tx, ty, tw, th]

							# we set the anchor to positive if the IOU is >0.7 (it does not matter if there was another better box, it just indicates overlap)
							if curr_iou > C.rpn_max_overlap:
								bbox_type = 'pos'
								num_anchors_for_bbox[bbox_num] += 1
								# we update the regression layer target if this IOU is the best for the current (x,y) and anchor position
								if curr_iou > best_iou_for_loc:
									best_iou_for_loc = curr_iou
									best_regr = (tx, ty, tw, th)

							# if the IOU is >0.3 and <0.7, it is ambiguous and no included in the objective
							if C.rpn_min_overlap < curr_iou < C.rpn_max_overlap:
								# gray zone between neg and pos
								if bbox_type != 'pos':
									bbox_type = 'neutral'

					# turn on or off outputs depending on IOUs
					if bbox_type == 'neg':
						y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
						y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
					elif bbox_type == 'neutral':
						y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
						y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0
					elif bbox_type == 'pos':
						y_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
						y_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1
						start = 4 * (anchor_ratio_idx + n_anchratios * anchor_size_idx)
						y_rpn_regr[jy, ix, start:start+4] = best_regr

	# we ensure that every bbox has at least one positive RPN region

	for idx in range(num_anchors_for_bbox.shape[0]):
		if num_anchors_for_bbox[idx] == 0:
			# no box with an IOU greater than zero ...
			if best_anchor_for_bbox[idx, 0] == -1:
				continue
			y_is_box_valid[
				best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios *
				best_anchor_for_bbox[idx,3]] = 1
			y_rpn_overlap[
				best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios *
				best_anchor_for_bbox[idx,3]] = 1
			start = 4 * (best_anchor_for_bbox[idx,2] + n_anchratios * best_anchor_for_bbox[idx,3])
			y_rpn_regr[
				best_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], start:start+4] = best_dx_for_bbox[idx, :]

	y_rpn_overlap = np.transpose(y_rpn_overlap, (2, 0, 1))
	y_rpn_overlap = np.expand_dims(y_rpn_overlap, axis=0)

	y_is_box_valid = np.transpose(y_is_box_valid, (2, 0, 1))
	y_is_box_valid = np.expand_dims(y_is_box_valid, axis=0)

	y_rpn_regr = np.transpose(y_rpn_regr, (2, 0, 1))
	y_rpn_regr = np.expand_dims(y_rpn_regr, axis=0)

	pos_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 1, y_is_box_valid[0, :, :, :] == 1))
	neg_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 0, y_is_box_valid[0, :, :, :] == 1))

	num_pos = len(pos_locs[0])

	# one issue is that the RPN has many more negative than positive regions, so we turn off some of the negative
	# regions. We also limit it to 256 regions.
	num_regions = 256

	if len(pos_locs[0]) > num_regions/2:
		val_locs = random.sample(range(len(pos_locs[0])), len(pos_locs[0]) - num_regions/2)
		y_is_box_valid[0, pos_locs[0][val_locs], pos_locs[1][val_locs], pos_locs[2][val_locs]] = 0
		num_pos = num_regions/2

	if len(neg_locs[0]) + num_pos > num_regions:
		val_locs = random.sample(range(len(neg_locs[0])), len(neg_locs[0]) - num_pos)
		y_is_box_valid[0, neg_locs[0][val_locs], neg_locs[1][val_locs], neg_locs[2][val_locs]] = 0

	y_rpn_cls = np.concatenate([y_is_box_valid, y_rpn_overlap], axis=1)
	y_rpn_regr = np.concatenate([np.repeat(y_rpn_overlap, 4, axis=1), y_rpn_regr], axis=1)

	return np.copy(y_rpn_cls), np.copy(y_rpn_regr)

參考部落格：

實現Faster R-CNN的keras程式碼理解(三)

計算RPN網路的輸出值函式為datagenerators.py中的calc_rpn(): 輸入：配置檔案，增強後的圖片的資訊，圖片原寬，圖片原高，resize後的寬，resize後的高，網路最後輸出的卷積層的大小；輸出：y_rpn_cls前半段是否包含物體，y_rp

CNN--卷積神經網路從R-CNN到Faster R-CNN的理解(CIFAR10分類程式碼)

1. 什麼是CNN 卷積神經網路（Convolutional Neural Networks, CNN）是一類包含卷積計算且具有深度結構的前饋神經網路（Feedforward Neural Networks），是深度學習（deep learning）的代表演算法之一。我們先來看卷積神經網路各個層級結構圖

[目標檢測] Faster R-CNN 深入理解 && 改進方法彙總

Faster R-CNN 從2015年底至今已經有接近兩年了，但依舊還是Object Detection領域的主流框架之一，雖然推出了後續 R-FCN，Mask R-CNN 等改進框架，但基本結構變化不大。同時不乏有SSD，YOLO等骨骼清奇的新作，但精度上

Faster R-CNN 深入理解 && 改進方法彙總

Faster R-CNN 從2015年底至今已經有接近兩年了，但依舊還是Object Detection領域的主流框架之一，雖然推出了後續 R-FCN，Mask R-CNN 等改進框架，但基本結構變化不大。同時不乏有SSD，YOLO等骨骼清奇的新作，但精度上依然以Fast

[目標檢測] Faster R-CNN 深入理解 && 改進方法彙總

Faster R-CNN 從2015年底至今已經有接近兩年了，但依舊還是Object Detection領域的主流框架之一，雖然推出了後續 R-FCN，Mask R-CNN 等改進框架，但基本結構變化不大。同時不乏有SSD，YOLO等骨骼清奇的新作，但精度上依然以Faste

Faster R-CNN程式碼之 anchors 分析

anchors作為產生proposal的rpn中的一個重點內容，在Faster R-CNN中被重點介紹，下面我們來學習一下anchors產生部分程式碼。我主要將其中的部分重點程式碼展示出來。程式碼引用自Shaoqing Ren的Matlab下Faster R-CNN。

arrayfun, cellfun, bsxfun函式與Faster R-CNN程式碼

在Faster R-CNN的matlab程式碼裡，看到了很多不認識的“fun”系列函式，在此總結一下。首先，向量化程式設計：arrayfun及cellfun函式的使用，來實現將任意函式應用到陣列內包括結構在內的所有元素。這樣很多以前不可避免的迴圈現在可以向量

Faster R-CNN演算法理解

1、文章概述 Faster r-cnn是2016年提出的文章，有兩個模型，一個是ZF模型，一個是VGG模型。在VOC07+12資料集中，ZF模型的mAP值達到59.9%，17fps; VGG模型的mAP值達到73.2%，5fps。 Faster

對Faster R-CNN的理解(3)

font img left box strong 技術 mar 圖片 http 2.2 邊框回歸邊框回歸使用下面的幾個公式： xywh是預測值,帶a的是anchor的xywh,帶*的是GT Box的xywh,可以看作是anchor經過一定的變換回歸到附近的G

tensorflow+faster rcnn程式碼理解（三）：損失函式構建

前面兩篇部落格已經敘述了基於vgg模型構建faster rcnn的過程： tensorflow+faster rcnn程式碼理解（一）：構建vgg前端和RPN網路 tensorflow+faster rcnn程式碼解析（二）：anchor_target_layer、proposal_targ

Faster R-CNN基於程式碼實現的細節

Faster RCNN paper : https://arxiv.org/abs/1506.01497 Bound box regression詳解 : http://download.csdn.NET/download/zy1034092330/9940097（來源：轉載自: http://blog

Faster R-CNN：詳解目標檢測的實現過程

最大的中心 width 小數據等等 eat tar 優先博文本文詳細解釋了 Faster R-CNN 的網絡架構和工作流，一步步帶領讀者理解目標檢測的工作原理，作者本人也提供了 Luminoth 實現，供大家參考。 Luminoth 實現：h

純C++版500VIP源碼下載的Faster R-CNN（通過caffe自定義RPN層實現）

方便預測大致 ole test cto oop 可執行文件 names 這裏500VIP源碼下載 dsluntan.com 介紹的是通過添加自定義層（RPN層）代替python層，實現c++版的Faster R-CNN，因為去掉python了，所以部署時不會因為牽扯到p

Faster R-CNN：利用區域提案網路實現實時目標檢測論文翻譯

Faster R-CNN論文地址:Faster R-CNN Faster R-CNN專案地址:https://github.com/ShaoqingRen/faster_rcnn 摘要目前最先進的目標檢測網路需要先用區域提案演算法推測目標位置，像SPPnet1和Fast R-CNN2

detectron程式碼理解（三）：RPN構建與相應的損失函式

1.RPN的構建對RPN的構建在FPN.py的add_fpn_rpn_output函式中 def add_fpn_rpn_outputs(model, blobs_in, dim_in, spatial_scales): """Add RPN on FPN specific out

分散式鎖以及三種實現（包含測試程式碼）

分散式鎖分散式的CAP理論告訴我們“任何一個分散式系統都無法同時滿足一致性（Consistency）、可用性（Availability）和分割槽容錯性（Partition tolerance），最多隻能同時滿足兩項。”所以，很多系統在設計之初就要對這三者做出取捨。在網際網路領域的絕大多

Faster R-CNN理解、討論

論文 : Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. PAMI2017. GitHub : 3. 補充程式Detectron : ht

Faster R-CNN：用區域提案網路實現實時目標檢測

摘要最先進的目標檢測網路依賴於區域建議演算法來假設物體的位置.像sppnet[1]和快速r-cnn[2]這樣的進步減少了這些檢測網路的執行時間，將區域提案計算暴露為瓶頸。在本文中，我們引入了一個區域提案網路(RPN)，它與檢測網路共享全影象卷積特徵，從而實現了幾乎免費的區域提案。RPN是一個完

tensorflow+faster rcnn程式碼理解（四）boundingbox迴歸

1.為什麼要做Bounding-box regression？如圖所示，綠色的框為飛機的Ground Truth，紅色的框是提取的Region Proposal。那麼即便紅色的框被分類器識別為飛機，但是由於紅色的框定位不準(IoU<0.5)，那麼這張圖相當於沒有正確的檢測出飛機。如

tensorflow+faster rcnn程式碼理解（一）：構建vgg前端和RPN網路

0.前言該程式碼執行首先就是呼叫vgg類建立一個網路物件self.net if cfg.FLAGS.network == 'vgg16': self.net = vgg16(batch_size=cfg.FLAGS.ims_per_batch) 該類位於vgg.py中，如下：

實現Faster R-CNN的keras程式碼理解(三)

相關推薦