1. 程式人生 > >CVPR2018論文筆記(三)PPFNet_Part3

CVPR2018論文筆記(三)PPFNet_Part3

這周重點學習了本文第四部分內容,是對PPFNet的全面介紹。

4.PPFNet

(1)Overview

概述部分主要提到了下文重點講述的內容。第一,對輸入準備作出解釋。第二,介紹PPFNet的體系結構。第三,作者準備用一個新的損失函式來解釋訓練方法,以全域性方式解決組合對應問題。
【Loss function】損失函式(loss function)是用來估量你模型的預測值f(x)與真實值Y的不一致程度,它是一個非負實值函式,通常使用L(Y, f(x))來表示,損失函式越小,模型的魯棒性就越好。(定義轉自https://blog.csdn.net/hk121/article/details/71465469

)

(2)Encoding of Local Geometry

1.給定一個位於點雲xr∈X上的參考點,定義一個區域性區域Ω ⊂ X,並在這個區域性附近收集一組點{mi} ∈ Ω並計算點集的法線;
2.將相關聯的區域性參考系[41]的標準軸對準貼片,定向的{xr ∪ {xi}}表示一個區域性幾何,我們稱之為區域性貼圖;
3.將每個相鄰點i與參考點r配對,並計算PPFs;
4.注意,從複雜性方面來說,這與使用點本身無關,由於中心參考點xr的固定,省略了二次配對。
如圖3所示,最後的區域性幾何描述和輸入到PPFNet中的是點法線和PPFs的組合集合:
【Local reference frame】In theoretical physics, a local reference frame (local frame) refers to a coordinate system or frame of reference that is only expected to function over a small region or a restricted region of space or spacetime.(From Wikipedia)
公式來自論文


圖片來自論文

(3)Network architecture

PPFNET的總體結構如圖2所示。輸入包括從一個片段均勻取樣的N個區域性貼片。由於點式資料表示的稀疏性和PointNet對GPU的高效利用,PPFNet可以同時合併N個貼片。PPFNET的第一個模組是一組mini-PointNets,從區域性貼片中提取特徵。在訓練期間,所有的PointNets都共享權重和梯度。然後,最大池化層將所有區域性特徵聚合為一個全域性特徵,將截然不同的區域性資訊彙總到整個片段的全域性背景中。然後將該全域性特徵連線到每個區域性特徵。使用一組MLP進一步將全域性和區域性特徵融合到最終全域性背景感知的區域性描述符中。

(4)N-tuple loss

Our goal is to use PPFNet to extract features for local patches, a process of mapping from a high dimensional non-linear data space into a low dimensional linear feature space. Distinctiveness of the resulting features are closely related to the separability in the embedded space. Ideally, the proximity of neighboring patches in the data space should be preserved in the feature space.
我們的目標是為區域性貼片提取特徵,一個從一個高維非線性資訊資料空間對映到一個低維線性特徵空間的過程。所得到特徵的獨特性與嵌入空間(降維後的空間)的可分離性密切相關。理想情況下,資料空間中相鄰貼片的鄰近度應該被儲存在特徵空間中。如圖4所示:
圖片來自論文
Figure 4. Illustration of N-tuple sampling in feature space. Green lines link similar pairs, which are coerced to keep close. Red lines connect non-similar pairs, pushed further apart. Without N-tuple loss, there remains to be some non-similar patches that are close in the feature space and some distant similar patches. Our novel N-tuple method pairs each patch with all the others guaranteeing that all the similar patches remain close and non-similar ones, distant.
圖4,對特徵空間N元組抽樣的說明。綠色的線連線相似的點對,這些點對被“拉”得保持相近。紅色的線連線非相似的點對,把它們推得更分離。在沒有N元組損失的情況下,仍然會有一些在特徵空間中距離近的非相似貼片和一些距離遠的相似貼片。我們新穎的N元組方法將每一個貼片與其他貼片配對以保證所有相似的貼片保持相近並且不相似的貼片保持遠距離。
To this end, the state of the art seems to adopt two loss functions: contrastive [48] and triplet [23], which try to consider pairs and triplets respectively. Yet, a fragment consists of more than 3 patches and in that case the widely followed practice trains networks by randomly retrieving 2/3-tuples of patches from the dataset. However, networks trained in such manner only learn to differentiate maximum 3 patches, preventing them from uncovering the true matching, which is combinatorial in the patch count.
為此,現有技術似乎採用了兩種損失函式:對比[48]和三個一組[23],他們分別試圖考慮兩兩一組和三個一組。然而,一個片段包含超過3個貼片,並且在這種情況下,廣泛遵循通過從資料集中隨機檢索貼片的2/3元組的實際訓練網路。然而,以這種方式訓練的網路僅學習區分最大3個貼片,防止它們發現真實匹配,這種真實的匹配就是在貼片計數中組合。
Generalizing these losses to N-patches, we propose N-tuple loss, an N-to-N contrastive loss, to correctly learn to solve this combinatorial problem by catering for the many-to-many relations as depicted in Fig. 4. Given the ground truth transformation T, N-tuple loss operates by constructing a correspondence matrix M ∈ RN×N on the points of the aligned fragments. M = (mij) where:
將這些損失推廣到N元組,我們提出N元組損失,一種N到N的對比損失,用以通過滿足如圖4所示的多對多關係來正確學習解決這類組合問題。給出地面真實變換T,N元損失通過在已校準的片段的點上構造一個關聯矩陣M ∈ RN×N。M = (mij)來自:
公式來自論文
1 is an indicator function. Likewise, we compute a feature space distance matrix D ∈ RN×N and D = (dij) where
符號1是一個指示函式。同樣地,我們計算一個特徵空間距離矩陣D ∈ RN×N並且D = (dij)來自:
公式來自網路
The N-tuple loss then functions on the two distance matrices solving the correspondence problem. For simplicity of expression, we define an operation ∑*(·) to sum up all the elements in a matrix. N-tuple loss can be written as:
N元損失接著在兩個距離矩陣上起到解決一致性問題的作用。為了簡化表示式,我們定義了一個運算∑*(·)來求取在一個矩陣中所有元素之和。N元損失可以被寫作:
公式來自論文
Here ◦ stands for Hadamard Product - element-wise multiplication. α is a hyper-parameter balancing the weight between matching and non-matching pairs and θ is the lower bound on the expected distance between non-correspondent pairs. We train PPFNet via N-tuple loss, as shown in Fig.5, by drawing random pairs of fragments instead of patches. This also eases the preparation of training data.
這裡◦代表Hadamard Product - element-wise法,α是一個超引數平衡已匹配的點和未匹配點對的寬度,θ是非對應點對之間期望距離的下界。我們通過N元組損失來訓練PPFNet,正如圖5所示,通過繪製片段的隨機對而不是貼片。這也簡化了訓練資料的準備。
圖片來自論文
Figure 5. Overall training pipeline of PPFNet. Local patches are sampled from a pair of fragments respectively, and feed into PPFNet to get local features. Based on these features a feature distance matrix is computed for all the patch pairs. Meanwhile, a distance matrix of local patches is formed based on the ground-truth rigid pose between the fragments. By binarizing the distance matrix, we get a correspondence matrix to indicate all the matching and non-matching relationships between patches. N-tuple loss is then calculated by coupling the feature distance matrix and correspondence matrix to guide the PPFNet to find an optimal feature space.
圖5.PPFNet的整體訓練管道圖。區域性貼圖是分別從一對片段中提取而來的,並且反饋到PPFNet中獲得區域性特徵。基於這些特徵,計算所有貼片對的特徵距離矩陣。同時,在碎片之間基於地面真實剛性姿態形成區域性貼片距離矩陣。通過對距離矩陣進行二值化,我們獲得了一個對應矩陣,用於表示所有貼片之間所有匹配和未匹配關係。接著通過耦合特徵距離矩陣和對應矩陣來計算N元組損失,來引導PPFNet尋找一個最優特徵空間。

關於N元損失函式部分花了很大部分介紹,但是實際上其工作原理還需要花時間進一步理解。這周各方面工作步入正軌。