1. 程式人生 > >三維卷積:全景影象Spherical CNNs(Code)

三維卷積:全景影象Spherical CNNs(Code)

         卷積神經網路(CNN)可以很好的處理二維平面圖像的問題。然而,對球面影象進行處理需求日益增加。例如,對無人機、機器人、自動駕駛汽車、分子迴歸問題、全球天氣和氣候模型的全方位視覺處理問題。

         將球形訊號的平面投影作為卷積神經網路的輸入的這種Too Naive做法是註定要失敗的,Cnns的巨大成就來源於區域性感受野的權值共享,而多層結構總能找到不同rect的相同目標,給出響應。而對於球形影象,一個目標在圖片的不同位置是發生形變的,若要使用CNNs直接共享,構建的區域性感受野理應描述這種轉換。如下圖所示,而這種平面投影引起的空間扭曲會導致CNN無法共享權重。

        

     We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized(non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

       如何使三維影象由二維影象重構出來,解決在不同位置產生形變問題,經典的FFT方法和李群模型就成為這種橋樑。

       區分出三維影象和平面的細微差別,把球面影象看做是三維流形,把球面展開為離散的三維李群,把SO(3)的關係用CNNs的高層進行表示。

      As shown in Figure 1, there is no good way to use translational convolution or cross-correlation1 to analyze spherical signals. The most obvious approach, then, is to change the definition of crosscorrelation by replacing filter translations by rotations. Doing so, we run into a subtle but important difference between the plane and the sphere: whereas the space of moves for the plane (2D translations) is itself isomorphic to the plane, the space of moves  for the sphere (3D rotations) is a different, three-dimensional manifold called SO(3)2. It follows that the result of a spherical correlation (the output feature map) is to be considered a signal on SO(3), not a signal on the sphere, S2. For this reason, we deploy SO(3) group correlation in the higher layers of a spherical CNN (Cohen and Welling, 2016).

       The implementation of a spherical CNN (S2-CNN) involves two major challenges. Whereas a square grid of pixels has discrete translation symmetries, no perfectly symmetrical grids for the sphere exist. This means that there is no simple way to define the rotation of a spherical filter by one pixel. Instead, in order to rotate a filter we would need to perform some kind of interpolation. The other challenge is computational efficiency; SO(3) is a three-dimensional manifold, so a naive implementation of SO(3) correlation is O(n6).

       球形CNNs的兩個難點:影象網格化的粒度,多大的粒度分解能保證重建的準確性;SO(3)的三維流形計算複雜度問題,時間複雜度是O(n6)的。

........................................

The Key moments:

      使用G-FFT進行快速相關性卷積,的相關結構。It is well known that correlations and convolutions can be computed efficiently using the Fast Fourier Transform (FFT). This is a result of the Fourier theorem, which states that[f    = ^ f ^  . Since the FFT can be computed in O(n log n) time and the product has linear complexity, implementing the correlation using FFTs is asymptotically faster than the naive O(n2) spatial implementation.

       .................

   

    .......................................

實驗效果:

       Results We evaluate by RMSE and compare our results to Montavon et al. (2012) and Raj et al. (2016) (see table 3). Our learned representation outperforms all kernel-based approaches and a MLP trained on sorted Coulomb matrices. Superior performance could only be achieved for an MLP trained on randomly permuted Coulomb matrices. However, sufficient sampling of random permutations grows exponentially with N, so this method is unlikely to scale to large molecules.

       文中定義了S2和SO(3)的互相關,並分析了它們的屬性,進而實現了一個通用的RRT相關演算法。實驗的數值結果證實了該演算法的穩定性和準確性,即使在深度網路上依然有效。

       總之,在準確率、可擴充套件性、等方面是綜合最有前途的一個三維網路。

進一步優化:

      For intrinsically volumetric tasks like 3D model recognition, we believe that further improvements can be attained by generalizing further beyond SO(3) to the roto-translation group SE(3). The development of Spherical CNNs is an important first step in this direction. Another interesting generalization is the development of a Steerable CNN for the sphere (Cohen and Welling, 2017), which would make it possible to analyze vector fields such as global wind directions, as well as other sections of vector bundles over the sphere.

       把SO(3)上的計算往SE(3)上進行轉化,把旋轉相關性變換到切空間的平移SE(3),應該可以達到新的加速效果。

Appendix:

李群與李代數

三維旋轉矩陣構成了特殊正交群SO(3),而變換矩陣構成了特殊歐氏群SE(3)

 

 但無論SO(3),還是SE(3),它們都不符合加法封閉性,即加之後不再符合旋轉矩陣的定義,但是乘法卻滿足,將這樣的矩陣稱為群。即只有一種運算的集合叫做群。

 群記作G=(A, .),其中A為集合,.表示運算。群要求運算滿足以下幾個條件:

(1)封閉性。(2)結合律。

(3)么元。一種集合裡特殊的數集。

(4)逆。

可以證明,旋轉矩陣集合和矩陣乘法構成群,而變換矩陣和矩陣乘法也構成群。

介紹了群的概念之後,那麼,什麼叫李群呢?

李群就是連續(光滑)的群。一個剛體的運動是連續的,所以它是李群。

每個李群都有對應的李代數。那麼什麼叫李代數呢?

李代數就是李群對應的代數關係式。

李群和李代數之間的代數關係如下:

可見兩者之間是指數與對數關係。

 那麼exp(φ^)是如何計算的呢?它是一個矩陣的指數,在李群和李代數中,它稱為指數對映。任意矩陣的指數對映可以寫成一個泰勒展開式,但是隻有在收斂的情況下才會有結果,它的結果仍然是一個矩陣。

 同樣對任意一元素φ,我們亦可按此方式定義它的指數對映:

 由於φ是三維向量,我們可以定義它的模長θ和方向向量a滿足使φ=θa。那麼,對於a^,可以推匯出以下兩個公式:

 設a=(cosα, cosβ, cosγ),可知(cosα)^2+(cosβ)^2+(cosγ)^2=1

 (1)a^a^=aaT-I

 (2)a^a^a^=-a^

 上面兩個公式說明了a^的二次方和a^的三次方的對應變換,從而可得:

exp(φ^)=exp(θa^)=∑(1/n!(θa^)n)=...=a^a^+I+sinθa^-cosθa^a^=(1-cosθ)a^a^+I+sinθa^=cosθI+(1-cosθ)aaT+sinθa^.

回憶前一講內容,它和羅德里格斯公式如出一轍。這表明,so(3)實際上就是由旋轉向量組成的空間,而指數對映即羅德里格斯公式。通過它們我們把so(3)中任意一個向量對應到了一個位於SO(3)中的旋轉矩陣。反之,如果定義對數對映,我們也能把SO(3)中的元素對應到so(3)中:

但通常我們會通過跡的性質分別求解轉角和轉軸,那種方式會更加省事一些。

 OK,講了李群和李代數的對應轉換關係之後,有什麼用呢?

主要是通過李代數來對李群進行優化。比如說,對李群中的兩個數進行運算,對應的他們的李代數會有什麼變化?

首先是,兩個李群中的數進行乘積時,對應的李代數是怎麼樣的變化,是不是指數變化呢?但是注意,李群裡的數是矩陣,不是常數,所以不滿足ln(exp(A+B))=A+B,因為A,B是矩陣,不是常數,那麼是怎麼的對應關係呢?