1. 程式人生 > >#讀原始碼+論文# 三維點雲分割Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM

#讀原始碼+論文# 三維點雲分割Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM

from Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM

  1. 超體素方法進行預分割,將點雲根據相似性變成表層面片(surface patches)降低計算複雜度。

         將場景分割問題轉換為圖分割問題(graph partitioning problem)

  • Method 1:Mean-shift聚類演算法 計算node之間的距離
  1. node指的是每個patch,連線node之間的line就是相鄰patch的共邊;

  2. 距離可以是歐氏距離,也可以是馬氏距離;
  3. Mean-shift演算法可見簡單介紹及Python實現或者簡單的機器學習演算法Mean-shift演算法

缺點:計算量太大

  • Method 2:利用面片的法向量方法  聚類  法向量可以表示出區域性凸性資訊。

缺點:當noise太多的時候可靠性降低。

  • 最終使用method 2 結合可靠性平面來做分割 最後使用圖割法分割

關於2D Object Detection and Semantic Segmentation

An essential component to get semantic information is object detection

, which can localize object instances in images. Girshick et al. [21] presented R-CNN, which proposed to apply CNN to object detection. Other similar methods have been proposed in recent years, like Fast R-CNN [22], Faster RCNN [23], Mask-RCNN [24] and YOLO [25-26]. R-CNN uses selective search algorithm for generating region proposals, which runs very slow. Faster R-CNN replaces the slow selective search algorithm with a fast neural net. Mask R-CNN
improves the region of interest (ROI) pooling layer and extends Faster R-CNN to pixel-level image segmentation
Semantic segmentation is to understand an image at a pixel level, which can label each pixel with a class identity. Similar to object detection, state-of-the-art semantic segmentation approaches also rely on CNN. FCN [4] by Long et al. is the first end-to-end system, which popularizes CNN architecture for semantic segmentation. U-Net [5] is a popular encoder-decoder architecture which can make use of annotated samples more efficiently and have a higher accuracy. SegNet [6] is a similar encoderdecoder architecture. SegNet copies indices from max-pooling for up-sampling, which makes it more memory efficient. RefineNet [7] proposes a method called RefineNet block which fuses both high resolution and low resolution features. It solves the problem of significant decrease in image resolution when we repeat the sub-sampling operation. PSPNet [8] introduces a pyramid pooling method to aggregate the context. DeepLab [9-11] utilizes dilated convolutions to increase the field of view.