1. 程式人生 > >論文翻譯:sort : SIMPLE ONLINE AND REALTIME TRACKING

論文翻譯:sort : SIMPLE ONLINE AND REALTIME TRACKING

粉色:重點演算法       紫色:生癖詞彙       綠色:引文&未補充公式

概述:

多目標跟蹤中SORT演算法的理解

在跟蹤之前,對所有目標已經完成檢測,實現了特徵建模過程。
1. 第一幀進來時,以檢測到的目標初始化並建立新的跟蹤器,標註id。
2. 後面幀進來時,先到卡爾曼濾波器中得到由前面幀box產生的狀態預測和協方差預測。求跟蹤器所有目標狀態預測與本幀檢測的box的IOU,通過匈牙利指派演算法得到IOU最大的唯一匹配(資料關聯部分),再去掉匹配值小於iou_threshold的匹配對。
3. 用本幀中匹配到的目標檢測box去更新卡爾曼跟蹤器,計算卡爾曼增益、狀態更新和協方差更新,並將狀態更新值輸出,作為本幀的跟蹤box。對於本幀中沒有匹配到的目標重新初始化跟蹤器。

其中,卡爾曼跟蹤器聯合了歷史跟蹤記錄,調節歷史box與本幀box的殘差,更好的匹配跟蹤id。
 

ABSTRACT

This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications.
To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%.
 Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter


and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers.
Furthermore, due to the simplicity of our tracking method, the tracker updates at a rate of 260 Hz which is over 20x faster than other state-of-the-art trackers.

本文探討了一種實用的多目標跟蹤方法,其主要重點是在線上和實時應用程式中有效地關聯物件


為此,檢測質量被確定為影響跟蹤效能的關鍵因素,更換檢測器可使跟蹤效能提高18.9%。
儘管只使用了一些常見技術的基本組合,如卡爾曼濾波器
匈牙利演算法的跟蹤元件,這種方法達到了一個精確度,可以媲美最先進的線上跟蹤器。
此外,由於我們跟蹤方法的簡單性,跟蹤器更新速度為260赫茲,比其他最先進的跟蹤器快20倍。

INTRODUCTION

Keeping in line with Occam’s Razor, appearance features beyond the detection component are ignored in tracking and only the bounding box position and size are used for both motion estimation and data association.

(奧姆特剃刀原理:為“如無必要,勿增實體”,即“簡單有效原理”)
與Occam的Razor保持一致,在跟蹤中忽略檢測元件之外的外觀特徵,只使用邊界框位置和大小來進行運動估計和資料關聯。

Furthermore, issues regarding short-term and long-term occlusion are also ignored,
as they occur very rarely and their explicit treatment introduces undesirable complexity into the tracking framework.
We argue that incorporating complexity in the form of object re-identification adds significant overhead into the tracking framework – potentially limiting its use in realtime applications.

此外,短期和長期遮擋的問題也被忽略,因為它們很少發生,而且它們的顯式處理將不希望的複雜性引入跟蹤框架。
我們認為,以物件重新識別的形式將複雜性合併到跟蹤框架中會增加大量開銷——這可能會限制它在實時應用程式中的使用規劃設計。

This design philosophy is in contrast to many proposed visual trackers that incorporate a myriad of components to handle various edge cases and detection errors [9, 10, 11, 12].
This work instead focuses on efficient and reliable handling of the common frame-to-frame associations.
Rather than aiming to be robust to detection errors, we instead exploit recent advances in visual object detection to solve the detection problem directly.

這種設計理念與許多被提議的視覺跟蹤器形成了對比,後者包含了大量的元件來處理各種邊緣情況和檢測錯誤[9,10,11,12]。
相反,這項工作側重於高效和可靠地處理常見的框架到框架關聯
我們的目標不是對檢測錯誤保持健壯性,而是利用視覺物件檢測的最新進展直接解決檢測問題。

This is demonstrated by comparing the common ACF pedestrian detector [8] with a recent convolutional neural network (CNN) based detector [13].
Additionally, two classical yet extremely efficient methods, Kalman filter [14] and Hungarian method [15], are employed to handle the motion prediction and data association components of the tracking problem respectively.
This minimalistic formulation of tracking facilitates both efficiency and reliability for online tracking, see Fig. 1.
In this paper, this approach is only applied to tracking pedestrians in various environments, however due to the flexibility of CNN based etectors [13], it naturally can be generalized to other objects classes.

通過比較常見的ACF行人檢測器[8]和最近基於卷積神經網路(tional neural network, CNN)的檢測器[13],可以證明這一點。
另外,採用卡爾曼濾波[14]和匈牙利方法[15]這兩種經典而高效的方法分別處理了跟蹤問題的運動預測和資料關聯分量
這種最小形式的跟蹤便於線上跟蹤的效率和可靠性,見圖1。
在本文中,這種方法僅適用於各種環境下的行人跟蹤,但是由於基於CNN的etector[13]的靈活性,自然可以推廣到其他物件類。

圖1

The main contributions of this paper are:
• We leverage the power of CNN based detection in the context of MOT.
• A pragmatic tracking approach based on the Kalman filter and the Hungarian algorithm is presented and evaluated on a recent MOT benchmark.
• Code will be open sourced to help establish a baseline method for research experimentation and uptake in collision avoidance applications.

本文的主要貢獻是:
在MOT的背景下,我們利用了基於CNN的檢測能力。
提出了一種基於卡爾曼濾波和匈牙利演算法的實用跟蹤方法,並在最近的MOT基準測試上進行了評估。
程式碼將開放原始碼,以幫助建立一個基線方法的研究試驗和採用在碰撞避免應用程式。

LITERATURE REVIEW

The method by Geiger et al. [20] uses the Hungarian algorithm [15] in a two stage process.
First, tracklets are formed by associating detections across adjacent frames where both geometry and appearance cues are combined to form the affinity matrix.
Then, the tracklets are associated to each other to bridge broken trajectories caused by occlusion, again using both geometry and appearance cues.
This two step association method restricts this approach to batch computation.
 Our approach is inspired by the tracking component of [20], however we simplify the association to a single stage with basic cues as described in the next section.

Geiger等人的方法在兩階段過程中使用了匈牙利演算法[15]。
首先,軌跡是通過關聯相鄰幀之間的檢測而形成的,在這些幀中,幾何和外觀線索結合在一起形成親和矩陣。
然後,軌跡把由遮擋引起的斷裂軌跡彼此關聯,同樣使用幾何和外觀提示。
這種兩步關聯方法限制了該方法的批量計算。
我們的方法受到了[20]跟蹤元件的啟發,但是我們將關聯簡化為一個階段,使用下一節中描述的基本線索。

3. METHODOLOGY

The proposed method is described by the key components of detection, propagating object states into future frames, associating current detections with existing objects, and managing the lifespan of tracked objects

該方法通過

1檢測

2將物件狀態傳播到未來幀

3將當前檢測與現有物件相關聯

4管理跟蹤物件的生命週期 等關鍵元件來描述

3.1. Detection

使用faster-rcnn

As we are only interested in pedestrians we ignore all other classes and only pass person detection results with output probabilities greater than 50% to the tracking framework.

由於我們只對行人感興趣,所以我們忽略了所有其他類,只將輸出概率大於50%的人檢測結果傳遞給跟蹤框架。

In our experiments, we found that the detection quality has a significant impact on tracking performance when comparing the FrRCNN detections to ACF detections.
This is demonstrated using a validation set of sequences applied to both an existing online tracker MDP [12] and the tracker proposed here.
Table 1 shows that the best detector (FrRCNN(VGG16)) leads to the best tracking accuracy for both MDP and the proposed method.

在我們的實驗中,當比較FrRCNN ,ACF檢測時,我們發現,檢測質量有顯著影響跟蹤效能。
這是演示了使用驗證組序列應用於現有的線上追蹤MDP[12]和本文提出的跟蹤
表1顯示了無論在MDP和還是該方法,最佳檢測器(FrRCNN (VGG16))導致最好的跟蹤精度

3.2. Estimation Model

Here we describe the object model, i.e. the representation and the motion model used to propagate a target’s identity into the next frame.
We approximate the inter-frame displacements of each object with a linear constant velocity model which is independent of other objects and camera motion.
The state of each target is modelled as:

在這裡,我們描述了物件模型,即表示和傳播目標的運動模型的身份進入下一幀。
我們近似迭代幀位移線性恆定速度模型的每個物件是獨立於其他物件和攝像機運動。
每個目標的狀態模型是:

x = [u, v, s, r, u̇, v̇, ṡ] T ,

where u and v represent the horizontal and vertical pixel location of the centre of the target, while the scale s and r represent the scale (area) and the aspect ratio of the target’s bounding box respectively.
 Note that the aspect ratio is considered to be constant.
 When a detection is associated to a target, the detected bounding box is used to update the target state where the velocity components are solved optimally via a Kalman filter framework [14].
 If no detection is associated to the target, its state is simply predicted without correction using the linear velocity model.

u和v代表的水平和垂直的目標中心畫素位置,雖然規模s代表規模(面積)r代表長寬比分別為目標的邊界框。
注意,長寬比被認為是常數
1.關聯:當檢測到的目標與一個目標相關聯時,檢測到的邊界框是用來更新目標狀態速度的元件是通過卡爾曼濾波框架[14]解決優化
2.不關聯:如果沒有檢測到目標相關聯,它的狀態是沒有使用預測線性速度模型校正的

3.3. Data Association

In assigning detections to existing targets, each target’s bounding box geometry is estimated by predicting its new location in the current frame.
The assignment cost matrix is then computed as the intersection-over-union (IOU) distance between each detection and all predicted bounding boxes from the existing targets. The assignment is solved optimally using the Hungarian algorithm.
 Additionally, a minimum IOU is imposed to reject assignments where the detection to target overlap is less than IOU min .

在分配檢測結果給現有目標,每個目標的邊界框幾何通過預測當前幀的新位置來估計。
然後計算作業成本矩陣作為intersection-over-union(借據)之間的距離每個檢測結果所有現有預測邊界框的目標
任務是使用匈牙利演算法解決優化的。
此外,當檢測目標重疊小於最小IOU,最小IOU拒絕任務的實施。

We found that the IOU distance of the bounding boxes implicitly handles short term occlusion caused by passing targets.
Specifically, when a target is covered by an occluding object, only the occluder is detected, since the IOU distance appropriately favours detections with similar scale.
This allows both the occluder target to be corrected with the detection while the covered target is unaffected as no assignment is made.(Occluder即遮擋體,Occludee即被遮擋體

我們發現邊界框的IOU距離隱式處理短期由過往目標引起的遮擋。
具體來說,當目標被一個遮擋物件,只有檢測到遮擋物體,因為IOU距離適當的支援檢測有相近規模物體
這允許的遮擋物目標由檢測來糾正,而被遮擋目標不受影響,因為沒有安排任務。

3.4. Creation and Deletion of Track Identities

When objects enter and leave the image, unique identities need to be created or destroyed accordingly.
 For creating trackers, we consider any detection with an overlap less than IOU min to signify the existence of an untracked object.
 The tracker is initialised using the geometry of the bounding box with the velocity set to zero.
Since the velocity is unobserved at this point the covariance of the velocity component is initialised with large values, reflecting this uncertainty.
Additionally, the new tracker then undergoes a probationary period where the target needs to be associated with detections to accumulate enough evidence in order to prevent tracking of false positives.

物件進入和離開圖片,獨特的身份需要相應的建立或銷燬
建立跟蹤器:我們考慮任何重疊不到IOU最小值的檢測框,來表示一個無路徑的物件的存在???
跟蹤器初始化:跟蹤是由使用速度設定為0的幾何邊界框初始化的。
因為此時速度是沒注意到的,協方差的速度部分由大的數值初始化了,反映了這種不確定性。
此外,新的追蹤然後經歷一個試用期,目標需要與檢測結果相關聯來積累足夠的證據,以防止假陽性的跟蹤。

Tracks are terminated if they are not detected for T Lost frames.
 This prevents an unbounded growth in the number of trackers and localisation errors caused by predictions over long durations without corrections from the detector.
 In all experiments T Lost is set to 1 for two reasons:
1.Firstly, the constant velocity model is a poor predictor of the true dynamics and
2.Secondly we are primarily concerned with frame-to-frame tracking where object re-identification is beyond the scope of this work.
 Additionally, early deletion of lost targets aids efficiency.
 Should an object reappear, tracking will implicitly resume under a new identity.

跟蹤終止:如果他們有T幀沒有被檢測到(丟失幀)。
這可以防止在長時間沒有來自檢測的矯正的情況下,追蹤器數量無限增長和由預測造成的本地化錯誤
在所有的實驗中 T loss被設定為1時,有兩個原因:
1.首先,恆定速度模型是一個實時動態不強的預測,
2.其次我們主要關心如何幀到幀跟蹤物件re-id地超出了這個工作範圍。
此外,儘早地刪除目標增加了效率。
如果一個物件重複出現,跟蹤隱式地以一個新的身份重新開始。