1. 程式人生 > >YOLO3D端到端的3d物體檢測 論文筆記

YOLO3D端到端的3d物體檢測 論文筆記

YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud

論文地址傳送門

這篇論文將Yolo應用到 3D 物體檢測,在KITTI 資料集下利用Titan X GPU達到了40Fp的效能。

本文的主要貢獻有以下幾點:

1- Extending YOLO V2[3] to include orientation of the OBB as a direct regression task.

2- Extending YOLO V2[3] to include the height and 3D OBB center coordinates (x,y,z) as a direct regression task.

3- Real-time performance evaluation and experimentation with Titan X GPU, on the challenging KITTI benchmark, with recommendations of the best grid-map resolution, and operating IoU threshold that balances speed and accuracy.

Point Cloud Representation

首先將點陣雲投射到2D 鳥瞰網格圖中,總共建立了兩張圖,一張圖中的每個cell(pixel)的值為相關點的最高值;另一張圖的每個cell(pixel)的值為點的密度,每個網格cell中的點越多值越大。密度的計算方式跟MV3D paper一樣:

min(1.0,\frac{log(N+1)}{log(64)})

Yaw Angle Regression

預測框的方向角取值範圍為-π到π,歸一化為-1到1,並利用均方差計算損失函式:

\sum_{i=0}^{s^{2}}\sum_{j=0}^{B}L_{ij}^{obj}\left ( \phi_{i} - \hat{\phi_{i}}} \right )^{^{2}}

3D Bounding Box Regression

這一部分更Yolo_V2一樣,只是擴充套件到了三維。唯一要注意的是 高度Z的值只對映到一個網格中,而不是像xy一樣對映到所有網格,這是由於物體的高度相差不大,可變度非常小。

b_{x}=\sigma (t_{x})+c_{x}

b_{y}=\sigma (t_{y})+c_{y}

b_{z}=\sigma (t_{z})+c_{z}

b_{w}=p_{w}e^{t_{w}}

b_{l}=p_{l}e^{t_{l}}

b_{h}=p_{h}e^{t_{h}}

Anchors Calculation

Yolo_v2中利用K均值聚類得到了很多大小不一的Anchors,基於這樣的先驗知識能夠覆蓋到資料可能出現的所有範圍的框,這樣可以利用不同大小的框檢測到不同大小的物體。然後汽車的大小相對來說比較固定,所以本文實現沒有利用K均值聚類產生大小不同的先驗框,而是計算3D boxs的均值作為先驗框的大小。

Combined Loss for 3D OBB

總體的Loss加了幾個維度,其他處理一樣。

Network Architecture and Hyper Parameters

相比於yolo_v2網路結構的一些改動:

  1. We modified one max-pooling layer to change the down-sampling from 32 to 16 so we can have a larger grid at the end; this has a contribution in detecting small objects like pedestrians and cyclists.

  2. We removed the skip connection from the model as we found it resulting in less accurate results.

  3. We added terms in the loss function for yaw, z center coordinate, and height regressions to facilitate the 3D oriented bounding box detection.

  4. Our input consists of 2 channels, one representing the maximum height, and the other one representing the density of points in the point cloud, computed as shown in Eq. (1)

KITTI Results and Error Analysis

對於Car,當IOU閾值在0.5時表現得很好,當大於0.5之後,隨著IOU閾值的增加,效能顯著下降,這表明我們很難讓盒子與物件完美對齊,這是Yolo模型普遍存在的問題。

隨著影象解析度的增加,預測推理時間顯著增加,如0.15m/piexl增加的0.1/piexl推理時間增加了大約一倍。