yolo v2 損失函式原始碼解讀

阿新 • • 發佈：2018-10-31

前提說明：

1, 關於 yolo 和 yolo v2 的詳細解釋請移步至如下兩個連結，或者直接看論文（我自己有想寫 yolo 的教程，但思前想後下面兩個連結中的文章質量實在是太好了_(:з」∠)_）

yolo: https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

yolo v2: https://zhuanlan.zhihu.com/p/25167153

2, 本文僅解讀 yolo v2 的 loss 函式的原始碼，該程式碼請使用如下命令

git clone https://github.com/pjreddie/darknet

後開啟 src/region_layer.c 檢視

3, yolo 的官方網站地址為：https://pjreddie.com/darknet/yolo/

4, 我除錯程式碼時使用的命令是：

./darknet detector train cfg/voc.data cfg/yolo-voc.cfg darknet19_448.conv.23

程式碼解讀：

region_layer.c
box get_region_box(float *x, float *biases, int n, int index, int i, int j, int w, int h, int stride)
{
    box b;
    b.x = (i + x[index + 0*stride]) / w;
    b.y = (j + x[index + 1*stride]) / h;
    b.w = exp(x[index + 2*stride]) * biases[2*n]   / w;
    b.h = exp(x[index + 3*stride]) * biases[2*n+1] / h;
    //printf("%f/%d/%d - %f/%f/%f/%f\n", x[index + 2*stride], w, h, b.x, b.y, b.w, b.h);
    return b;
}

float delta_region_box(box truth, float *x, float *biases, int n, int index, int i, int j, int w, int h, float *delta, float scale, int stride)
{
    box pred = get_region_box(x, biases, n, index, i, j, w, h, stride);
    float iou = box_iou(pred, truth);

    float tx = (truth.x*w - i);
    float ty = (truth.y*h - j);
    float tw = log(truth.w*w / biases[2*n]);
    float th = log(truth.h*h / biases[2*n + 1]);

    delta[index + 0*stride] = scale * (tx - x[index + 0*stride]);
    delta[index + 1*stride] = scale * (ty - x[index + 1*stride]);
    delta[index + 2*stride] = scale * (tw - x[index + 2*stride]);
    delta[index + 3*stride] = scale * (th - x[index + 3*stride]);
    return iou;
}

void forward_region_layer()
{
	...
	
	for (b = 0; b < l.batch; ++b) {
		if(l.softmax_tree){
			// 沒執行
		}
		// 下面的 for 迴圈是計算沒有物體的 box 的 confidence 的 loss
		// 1， 遍歷所有格子以及每個格子的 box，計算每個 box 與真實 box 的 best_iou
		// 2， 先不管三七二十一，把該 box 當成沒有目標來算 confidence 的 loss 
		// 3， 如果當前 box 的 best_iou > 閾值，則說明該 box 是有物體的，於是上面哪行計算的 loss 就不算數，因此把剛才計算的 confidence 的 loss 清零。
		// 假設圖片被分成了 13 * 13 個格子，那 l.h 和 l.w 就為 13
		// 於是要遍歷所有的格子，因此下面就要迴圈 13 * 13 次
		for (j = 0; j < l.h; ++j) {
            for (i = 0; i < l.w; ++i) {
				  // 每個格子會預測 5 個 boxes，因此這裡要迴圈 5 次
                for (n = 0; n < l.n; ++n) {
					  // 獲得 box 的 index
                    int box_index = entry_index(l, b, n*l.w*l.h + j*l.w + i, 0);
					  // 獲得 box 的預測 x, y, w, h，注意都是相對值，不是真實座標
                    box pred = get_region_box(l.output, l.biases, n, box_index, i, j, l.w, l.h, l.w*l.h);
                    float best_iou = 0;
                    // 下面的迴圈 30 次我是這麼理解的：
					  //		假設一張圖片中最多包含 30 個物體，於是對每一個物體求 iou
				      // PS：我看了很久都沒找到這個 30 能和什麼關聯上，於是猜測 30 的含義是“假設一張圖片中最多包含 30 個物體”。
                    for(t = 0; t < 30; ++t){
                        // get truth_box's x, y, w, h
                        box truth = float_to_box(net.truth + t*5 + b*l.truths, 1);
                        printf("\ti=%d, j=%d, n=%d, t=%d\n", i, j, n, t);
						// 遍歷完圖片中的所有物體後退出
                        if(!truth.x){
                            break;
                        }
                        float iou = box_iou(pred, truth);
                        if (iou > best_iou) {
                            best_iou = iou;
                        }
                    }
					// 獲得預測結果中儲存 confidence 的 index
                    int obj_index = entry_index(l, b, n*l.w*l.h + j*l.w + i, 4);
                    avg_anyobj += l.output[obj_index];
					// 這裡先不管三七二十一，直接把該 box 當成沒有目標來算 loss 了。
                    l.delta[obj_index] = l.noobject_scale * (0 - l.output[obj_index]);
					// 然後再做個判斷，如果當期 box 計算的 best_iou > 閾值的話，則說明該 box 是有物體的，於是上面哪行計算的 loss 就不算數，因此清零。
                    if (best_iou > l.thresh) {
                        l.delta[obj_index] = 0;
                    }
					
					// 查了查程式碼，這裡是“如果已經訓練的圖片數量 < 12800 的話則進入迴圈”，為什麼要判斷這玩意....
                    if(*(net.seen) < 12800){
						// 單純的獲取“以當前格子中心”為 x, y 的 box 作為 truth box
                        box truth = {0};
                        truth.x = (i + .5)/l.w;
                        truth.y = (j + .5)/l.h;
                        truth.w = l.biases[2*n]/l.w;
                        truth.h = l.biases[2*n+1]/l.h;
							// 將預測的 tx, ty, tw, th 和 實際box計算得出的 tx',ty', tw', th' 的差存入 l.delta
                        delta_region_box(truth, l.output, l.biases, n, box_index, i, j, l.w, l.h, l.delta, .01, l.w*l.h);
                    }
                }
            }
        }
		// 下面的迴圈 30 次中的 30 這個數我看了很久都沒找到這個 30 能和什麼關聯上，於是猜測 30 的含義是：“假設一張圖片中最多包含 30 個物體”
		// 因此下面是“直接遍歷一張圖片中的所有已標記的物體的中心所在的格子，然後計算 loss”，而不是“遍歷那 13*13 個格子後判斷當期格子有無物體，然後計算 loss”
		for(t = 0; t < 30; ++t){
			// get truth_box's x, y, w, h
            box truth = float_to_box(net.truth + t*5 + b*l.truths, 1);

			// 如果本格子中不包含任何物體的中心，則跳過
            if(!truth.x) break;
            float best_iou = 0;
            int best_n = 0;
            // 假設圖片被分成了 13 * 13 個格子，那 l.h 和 l.w 就為 13
			// 於是要遍歷所有的格子，因此下面就要迴圈 13 * 13 次
			// 也因此，i 和 j 就是真實物品中心所在的格子的“行”和“列”
            i = (truth.x * l.w);
            j = (truth.y * l.h);
            printf("%d %f %d %f\n", i, truth.x*l.w, j, truth.y*l.h);
            box truth_shift = truth;
            // 上面獲得了 truth box 的 x,y,w,h，這裡講 truth box 的 x,y 偏移到 0,0，記為 truth_shift.x, truth_shift.y，這麼做是為了方便計算 iou
            truth_shift.x = 0;
            truth_shift.y = 0;
            printf("index %d %d\n",i, j);
            // 每個格子會預測 5 個 boxes，因此這裡要迴圈 5 次
            for(n = 0; n < l.n; ++n){
            	// 獲得預測結果中 box 的 index
                int box_index = entry_index(l, b, n*l.w*l.h + j*l.w + i, 0);
                // 獲得 box 的預測 x, y, w, h，注意都是相對值，不是真實座標
                box pred = get_region_box(l.output, l.biases, n, box_index, i, j, l.w, l.h, l.w*l.h);
                // 這裡用 anchor box 的值 / l.w 和 l.h 作為預測的 w 和 h
                // ps: 我列印了下 l.bias_match，它的值是 1，說明是能走到裡面的，而之所以這麼做的原因我是這麼理解的：
				//		在 yolo v2 的論文中提到：預測 box 的 w,h 是根據 anchors 生成(anchors 是用 k-means 聚類得出的最優結果)，即：
				//			w = exp(tw) * l.biases[2*n]   / l.w
				//			h = exp(th) * l.biases[2*n+1] / l.h
				//		不過為什麼把 exp() 部分省去還有些疑惑，希望有知道原因的大佬能幫忙解答下。
                if(l.bias_match){
                    pred.w = l.biases[2*n]/l.w;
                    pred.h = l.biases[2*n+1]/l.h;
                }
                printf("pred: (%f, %f) %f x %f\n", pred.x, pred.y, pred.w, pred.h);
                // 上面 truth box 的 x,y 移動到了 0,0 ，因此預測 box 的 x,y 也要移動到 0,0，這麼做是為了方便計算 iou
                pred.x = 0;
                pred.y = 0;
                float iou = box_iou(pred, truth_shift);
                if (iou > best_iou){
                    best_iou = iou;
                    best_n = n;
                }
            }
            printf("%d %f (%f, %f) %f x %f\n", best_n, best_iou, truth.x, truth.y, truth.w, truth.h);

			// 根據上面的 best_n 找出 box 的 index
            int box_index = entry_index(l, b, best_n*l.w*l.h + j*l.w + i, 0);
            // 計算 box 和 truth box 的 iou
            float iou = delta_region_box(truth, l.output, l.biases, best_n, box_index, i, j, l.w, l.h, l.delta, l.coord_scale *  (2 - truth.w*truth
.h), l.w*l.h);
			// 如果 iou > .5，recall +1
            if(iou > .5) recall += 1;
            avg_iou += iou;

            //l.delta[best_index + 4] = iou - l.output[best_index + 4];
           	// 根據 best_n 找出 confidence 的 index
            int obj_index = entry_index(l, b, best_n*l.w*l.h + j*l.w + i, 4);
            avg_obj += l.output[obj_index];
            // 因為執行到這裡意味著該格子中有物體中心，所以該格子的 confidence 就是 1， 而預測的 confidence 是 l.output[obj_index]，所以根據公式有下式
            l.delta[obj_index] = l.object_scale * (1 - l.output[obj_index]);
			if (l.rescore) {
				// 用 iou 代替上面的 1(經除錯，l.rescore = 1，因此能走到這裡)
                l.delta[obj_index] = l.object_scale * (iou - l.output[obj_index]);
            }

			// 獲得真實的 class
            int class = net.truth[t*5 + b*l.truths + 4];
            if (l.map) class = l.map[class];
            // 獲得預測的 class 的 index
            int class_index = entry_index(l, b, best_n*l.w*l.h + j*l.w + i, 5);
            // 把所有 class 的預測概率與真實 class 的 0/1 的差 * scale，然後存入 l.delta 裡相應 class 序號的位置
            delta_region_class(l.output, l.delta, class_index, class, l.classes, l.softmax_tree, l.class_scale, l.w*l.h, &avg_cat);
            ++count;
            ++class_count;
        }
    }
    printf("\n");
    // 現在，l.delta 中的每一個位置都存放了 class、confidence、x, y, w, h 的差，於是通過 mag_array 遍歷所有位置，計算每個位置的平方的和後開根
    // 然後利用 pow 函式求平方
    *(l.cost) = pow(mag_array(l.delta, l.outputs * l.batch), 2);
    printf("Region Avg IOU: %f, Class: %f, Obj: %f, No Obj: %f, Avg Recall: %f,  count: %d\n", avg_iou/count, avg_cat/class_count, avg_obj/count, a
vg_anyobj/(l.w*l.h*l.n*l.batch), recall/count, count);

yolo v2 損失函式原始碼解讀

前提說明： 1, 關於 yolo 和 yolo v2 的詳細解釋請移步至如下兩個連結，或者直接看論文（我自己有想寫 yolo 的教程，但思前想後下面兩個連結中的文章質量實在是太好了_(:з」∠)_） yo

yolo v2 損失函式原始碼（訓練核心程式碼）解讀和其實現原理

前提說明： 1, 關於 yolo 和 yolo v2 的詳細解釋請移步至如下兩個連結，或者直接看論文（我自己有想寫 yolo 的教程，但思前想後下面兩個連結中的文章質量實在是太好了_(:з」∠)_） yolo: https://zhuanlan.

YOLO v2 損失函式原始碼分析

損失函式的定義是在region_layer.c檔案中，關於region層使用的引數在cfg檔案的最後一個section中定義。首先來看一看region_layer 都定義了那些屬性值： layer make_region_layer(int batch, int w, int h, int n,

深度學習之---yolo,kmeans計算anchor框原始碼解讀

k-means原理 K-means演算法是很典型的基於距離的聚類演算法，採用距離作為相似性的評價指標，即認為兩個物件的距離越近，其相似度就越大。該演算法認為簇是由距離靠近的物件組成的，因此把得到緊湊且獨立的簇作為最終目標。問題 K-Means演算法主要解決的問題如下圖所示。我們可以看到

jQuery原始碼解讀之init函式

jQuery的構造方法： // 直接new了一個物件。同時根據jQuery.fn = jQuery.prototype，jQuery.fn相當於jQuery.prototype。 jQuery = function( selector, context ) { return

pytorch yolov3 yolo層的構建矩陣運算思維啟蒙損失函式要求公示裡面的乘以相應的anchor

上一篇：pytorch yolov3 構建class Darknet 腦海中過一遍其實上一篇講到的，構建route和shortcut層，基本是簡單的層之間的疊加操作，但是yolo層要相對複雜些。寫部落格的過程中意識到了，作者如何將功能分塊實現。你比如： 1. 轉換輸入

Vue原始碼解讀-建構函式

src/core/instance/index.js此檔案主要實現了Vue初始化 // 引入模組 import { initMixin } from './init' import { stateMixin } from './state' import { renderMixin } from './r

YOLO-V3 視訊檢測函式流程解讀 demo()

對demo函式的理解： demo.h的宣告： void demo(char *cfgfile, char *weightfile, float thresh, float hier_thresh, int cam_index, const char *filename, char **n

YOLO-V3 圖片檢測函式流程解讀 draw_detection_v3()

YOLO-V2的執行函式路徑為：yolo.c中的test_yolo() --> image.c中的draw_detections() YOLO-V3的執行函式路徑為：detector.c中的test_detector()&nb

YOLO v1,YOLO v2,YOLO9000演算法總結與原始碼解析

1.YOLO v1簡介 YOLO出自2016 CVPR 《You Only Look Once:Unified, Real-Time Object Detection》。YOLO將目標區域定位於目標類別預測整合於單個神經網路模型中，實現了在準確率較高的情況下快

darknet原始碼解讀-yolov2損失計算

參考文章： yolov2損失計算的原始碼集中在region_layer.c檔案forward_region_layer函式中，為了兼顧座標、分類、目標置信度以及訓練效率，損失函式由多個部分組成，且不同部分都被賦予了各自的損失權重，整體計算公式如下。

YOLO V2 代碼分析

blog 不同的 backward -s .com index span ret info 3.3 passthrough操作 regorg layer分析：這裏ReorgLayer層就是將26∗26∗512的張量中26∗26切割

【React原始碼解讀】- 元件的實現

前言 react使用也有一段時間了，大家對這個框架褒獎有加，但是它究竟好在哪裡呢？讓我們結合它的原始碼，探究一二！（當前原始碼為react16，讀者要對react有一定的瞭解）回到最初根據react官網上的例子，快速構建react專案 npx create-react-app

tensflow自定義損失函式

三、自定義損失函式標準的損失函式並不合適所有場景，有些實際的背景需要採用自己構造的損失函式，Tensorflow 也提供了豐富的基礎函式供自行構建。例如下面的例子：當預測值（y_pred）比真實值（y_true）大時，使用 (y_pred-y_true)*loss_more 作為 loss，

【1】pytorch torchvision原始碼解讀之Alexnet

最近開始學習一個新的深度學習框架PyTorch。框架中有一個非常重要且好用的包：torchvision，顧名思義這個包主要是關於計算機視覺cv的。這個包主要由3個子包組成，分別是：torchvision.datasets、torchvision.models、torchvision.trans

Set介面_HashSet常用方法_JDK原始碼解讀

Set 介面繼承自 Collection ,Set 沒有新增方法，方法和 Collection 保持一致， Set 容器的特點：無序，不可重複，無序指Set 中的元素沒有索引，我們只能遍歷查詢，不重複指不允許加入重複的元素，更確切的說，新元素如果和Set 中某個元素通過 equals() 方

Tensorflow 兩個交叉熵損失函式的區別

tf.nn.sparse_softmax_cross_entropy_with_logits label：不含獨熱編碼，shape：[batch_size, ] logits：原始預測概率分佈向量，shape：[batch_size, num_classes] logits = np

vux之x-input使用以及原始碼解讀

前言近期專案中使用的vux中的input，以及使用自定義校驗規則和動態匹配錯誤提示，有時間記錄下自己的使用經歷和原始碼分析。希望大家多多指正，留言區發表自己寶貴的建議。詳解列舉官方文件中常用的幾個屬性的使用方法，程式碼如下 <group ref="group">

神經網路的損失函式

損失函式可以分成兩大類：分類和迴歸。這裡我們對這兩類進行了細分和講解。迴歸損失： L1loss（L1損失） L1損失，也稱平均絕對誤差（MAE），簡單說就是計算輸出值與真實值之間誤差的絕對值大小。這種度量方法在不考慮方向的情況下衡量誤差大小。和MSE的不同之處在於，MA

CS231n——機器學習演算法——線性分類（下：Softmax及其損失函式）

在前兩篇筆記中，基於線性分類上，線性分類中繼續記筆記。 1. Softmax分類器 SVM和Softmax分類器是最常用的兩個分類器，Softmax的損失函式與SVM的損失函式不同。對於學習過二元邏輯迴歸分類器的讀者來說，Softmax分類器就可以理解為邏輯迴歸分類器面對多個分類的一

yolo v2 損失函式原始碼解讀

相關推薦