YOLO3D端到端的3d物體檢測 論文筆記
YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud
論文地址傳送門
這篇論文將Yolo應用到 3D 物體檢測,在KITTI 資料集下利用Titan X GPU達到了40Fp的效能。
本文的主要貢獻有以下幾點:
1- Extending YOLO V2[3] to include orientation of the OBB as a direct regression task.
2- Extending YOLO V2[3] to include the height and 3D OBB center coordinates (x,y,z) as a direct regression task.
3- Real-time performance evaluation and experimentation with Titan X GPU, on the challenging KITTI benchmark, with recommendations of the best grid-map resolution, and operating IoU threshold that balances speed and accuracy.
Point Cloud Representation
首先將點陣雲投射到2D 鳥瞰網格圖中,總共建立了兩張圖,一張圖中的每個cell(pixel)的值為相關點的最高值;另一張圖的每個cell(pixel)的值為點的密度,每個網格cell中的點越多值越大。密度的計算方式跟MV3D paper一樣:
Yaw Angle Regression
預測框的方向角取值範圍為-π到π,歸一化為-1到1,並利用均方差計算損失函式:
3D Bounding Box Regression
這一部分更Yolo_V2一樣,只是擴充套件到了三維。唯一要注意的是 高度Z的值只對映到一個網格中,而不是像xy一樣對映到所有網格,這是由於物體的高度相差不大,可變度非常小。
Anchors Calculation
Yolo_v2中利用K均值聚類得到了很多大小不一的Anchors,基於這樣的先驗知識能夠覆蓋到資料可能出現的所有範圍的框,這樣可以利用不同大小的框檢測到不同大小的物體。然後汽車的大小相對來說比較固定,所以本文實現沒有利用K均值聚類產生大小不同的先驗框,而是計算3D boxs的均值作為先驗框的大小。
Combined Loss for 3D OBB
總體的Loss加了幾個維度,其他處理一樣。
Network Architecture and Hyper Parameters
相比於yolo_v2網路結構的一些改動:
-
We modified one max-pooling layer to change the down-sampling from 32 to 16 so we can have a larger grid at the end; this has a contribution in detecting small objects like pedestrians and cyclists.
-
We removed the skip connection from the model as we found it resulting in less accurate results.
-
We added terms in the loss function for yaw, z center coordinate, and height regressions to facilitate the 3D oriented bounding box detection.
-
Our input consists of 2 channels, one representing the maximum height, and the other one representing the density of points in the point cloud, computed as shown in Eq. (1)
KITTI Results and Error Analysis
對於Car,當IOU閾值在0.5時表現得很好,當大於0.5之後,隨著IOU閾值的增加,效能顯著下降,這表明我們很難讓盒子與物件完美對齊,這是Yolo模型普遍存在的問題。
隨著影象解析度的增加,預測推理時間顯著增加,如0.15m/piexl增加的0.1/piexl推理時間增加了大約一倍。