【論文筆記】Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification

阿新 • • 發佈：2019-01-02

摘要

Person re-identification (ReID) is an important task in computer vision. Recently, deep learning with a metric learning loss has become a common framework for ReID. In this paper, we propose a new metric learning loss with hard sample mining called margin smaple mining loss (MSML) which can achieve better accuracy compared with other metric learning losses, such as triplet loss. In experiments, our proposed methods outperforms most of the state-ofthe-art algorithms on Market1501, MARS, CUHK03 and CUHK-SYSU.

行人重識別是一個計算機視覺領域非常重要的一個任務。基於度量學習方法的深度學習技術如今成為了ReID的主流方法。在本論文中，我們提出了一種新的引入難樣本取樣的度量學習方法，這種方法叫做MSML。實驗表明，我們提出的方法擊敗了目前大多數的方法，在Market1501，MARS，CUHK03和CUHK-SYSU資料集上取得了state-of-the-arts的結果。

方法

Triplet loss是一種非常常用的度量學習方法，而Quadruplet loss和TriHard loss是它的兩個改進版本。Quadruplet loss相對於Triplet loss考慮了正負樣本對之間的絕對距離，而TriHard loss則是引入了hard sample mining的思想，MSML則吸收了這兩個優點。

度量學習的目標是學習一個函式g(x):RF→RD” role=”presentation” style=”position: relative;”>g(x):ℝF→ℝD空間的距離上。
通常我們需要定義一個距離度量函式D(x,y):RD×RD→R” role=”presentation” style=”position: relative;”>D(x,y):ℝD×ℝD→ℝ來表示嵌入空間(Embedding space)的距離，而這個距離也用來重識別行人圖片。

在國內外研究現狀裡面介紹的三元組損失、四元組損失和TriHard損失都是典型度量學習方法。給定一個三元組{a,p,n}” role=”presentation” style=”position: relative;”>{

a,p,n}{a,p,n}，三元組損失表示為：

Lt=(da,p−da,n+α)+” role=”presentation” style=”text-align: center; position: relative;”>Lt=(da,p−da,n+α)+Lt=(da,p−da,n+α)+ L_t = (d_{a,p}-d_{a,n}+\alpha)_+
三元組損失只考慮了正負樣本對之間的相對距離。為了引入正負樣本對之間的絕對距離，四元組損失加入一張負樣本組成了四元組{a,p,n1,n2}” role=”presentation” style=”position: relative;”>{a,p,n1,n2}{a,p,n1,n2}，而四元組損失也定義為：
Lq=(da,p−da,n1+α)++(da,p−dn1,n2+β)+” role=”presentation” style=”text-align: center; position: relative;”>Lq=(da,p−da,n1+α)++(da,p−dn1,n2+β)+Lq=(da,p−da,n1+α)++(da,p−dn1,n2+β)+
假如我們忽視引數α” role=”presentation” style=”position: relative;”>αα的影響，我們可以用一種更加通用的形式表示四元組損失:
Lq′=(da,p−dm,n+α)+” role=”presentation” style=”text-align: center; position: relative;”>Lq′=(da,p−dm,n+α)+Lq′=(da,p−dm,n+α)+ L_{q^\prime} = (d_{a,p}-d_{m,n}+\alpha)_+
其中m” role=”presentation” style=”position: relative;”>mm，則TriHard損失表示為：
Lth=1P×K∑a∈batch(maxp∈Ada,p−minn∈Bda,n+α)+” role=”presentation” style=”text-align: center; position: relative;”>Lth=1P×K∑a∈batch(maxp∈Ada,p−minn∈Bda,n+α)+Lth=1P×K∑a∈batch(maxp∈Ada,p−minn∈Bda,n+α)+ L_{th} = \frac{1}{P \times K}\sum_{a \in batch}(\max_{p \in A} d_{a,p}-\min_{n \in B} d_{a,n}+\alpha)_+
而TriHard損失同樣只考慮了正負樣本對之間的相對距離，而沒有考慮它們之間的絕對距離。於是我們把這種難樣本取樣的思想引入到Lq′” role=”presentation” style=”position: relative;”>Lq′Lq′，可以得到：
Lmsml=(maxa,pda,p−minm,ndm,n+α)+” role=”presentation” style=”text-align: center; position: relative;”>Lmsml=(maxa,pda,p−minm,ndm,n+α)+Lmsml=(maxa,pda,p−minm,ndm,n+α)+ L_{msml} = (\max_{a,p} d_{a,p}-\min_{m,n} d_{m,n}+\alpha)_+
其中a,p,m,n” role=”presentation” style=”position: relative;”>a,p,m,na,p,m,n可以看作是負樣本對的下界。MSML是為了把正負樣本對的邊界給推開，因此命名為邊界樣本挖掘損失。MSML只用了兩對樣本對計算損失，看上去浪費了很多訓練資料。但是這兩對樣本對是根據整個batch的結果挑選出來了，所以batch中的其他圖片也間接影響了最終的損失。並且隨著訓練週期的增加，幾乎所有的資料都會參與損失的計算。總的概括，MSML是同時兼顧相對距離和絕對距離並引入了難樣本取樣思想的度量學習方法。

如果用一張圖概括這幾個loss之間的關係的話，可以表示為下圖。

結果

論文裡在Market1501，MARS，CUHK03和CUHK-SYSU資料集都進行了對比實驗，為了減少實驗數量，並沒有在每個資料集上都做一次實驗，而是用所有資料集的訓練集訓練一個模型。為了增加結果的可信度，使用了Resnet50、inception-v2、Resnet-Xecption三個在ImageNet上pre-trained的網路作為base model，和classification、Triplet loss、Quadruplet loss、TriHard loss四個損失函式進行了對比。結果如下表，可以看出MSML的結果還是很不錯的。

簡評

MSML是一種新的度量學習方法，吸收了目前已有的一些度量學習方法的優點，能過進一步提升模型的泛化能力。本文在行人重識別問題上發表了這個損失函式，但是這是一個在影象檢索領域可以通用的度量學習方法。

【論文筆記】Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification

摘要

方法

結果

簡評

【論文筆記】Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification

【Person Re-ID】Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification

Person Re-identification 系列論文筆記（二）：A Discriminatively Learned CNN Embedding for Person Re-identification

【論文筆記】In Defense of the Triplet Loss for Person Re-Identification

【論文筆記】T Test

Reading Wikipedia to Answer Open-Domain Questions【論文筆記】

Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base【論文筆記】

Question Answering over Freebase with Multi-Column Convolutional Neural Networks【論文筆記】

Context-Aware Basic Level Concepts Detection in Folksonomies【論文筆記】

Question Answering with Subgraph Embeddings【論文筆記】

Information Extraction over Structured Data: Question Answering with Freebase【論文筆記】

Semantic Parsing on Freebase from Question-Answer Pairs【論文筆記】

vggface2人臉識別資料集【論文筆記】VGGFace2——一個能夠用於識別不同姿態和年齡人臉的資料集

【論文筆記】使用多流密集網路的密度感知單影象去雨

【論文筆記】用形狀做擋風玻璃上的雨滴檢測《Detection Of Raindrop With Various Shapes On A Windshield》

【論文筆記】光流在視訊行為識別中的作用

【論文筆記】Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

【論文筆記】視訊物體檢測(VID)系列 NoScope:1000x的視訊檢索加速演算法

【論文筆記】視訊物體檢測(VID)系列 FGFA：Flow-Guided Feature Aggregation for Video Object Detection

【論文筆記】Reaching agreement in the presence of faults (EIG)

【論文筆記】Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification

摘要

方法

結果

簡評

相關推薦