ROI Pooling層解析_程式碼原理詳細解釋和存在目的

阿新 • • 發佈：2019-01-06

ROIs Pooling顧名思義，是pooling層的一種，而且是針對ROIs的pooling；

整個 ROI 的過程，就是將這些 proposal 摳出來的過程，得到大小統一的 feature map。

什麼是ROI呢？
ROI是Region of interest的簡寫，指的是faster rcnn結構中，經過rpn層後，產生的proposal對應的box框。

所以ROI就是指矩形框，往往經過rpn後輸出的不止一個矩形框，所以這裡我們是對多個ROI進行Pooling。

ROI Pooling的輸入

輸入有兩部分組成：
1. data：指的是進入RPN層之前的那個Conv層的Feature Map，通常我們稱之為“share_conv”；
2. rois：指的是RPN層的輸出，一堆矩形框，形狀為1x5x1x1（4個座標+索引index），其中值得注意的是：座標的參考系不是針對feature map這張圖的，而是針對原圖的（神經網路最開始的輸入）

ROI Pooling的輸出

輸出是batch個vector，其中batch的值等於roi的個數，vector的大小為channelxwxh；ROI Pooling的過程就是將一個個大小不同的box矩形框，都對映成大小為wxh的矩形框；

如圖所示，我們先把roi中的座標對映到feature map上，對映規則比較簡單，就是把各個座標除以輸入圖片與feature map的大小的比值，得到了feature map上的box座標後，我們使用pooling得到輸出；由於輸入的圖片大小不一，所以這裡我們使用的spp pooling，spp pooling在pooling的過程中需要計算pooling後的結果對應的兩個畫素點反映社到feature map上所佔的範圍，然後在那個範圍中進行取max或者取average。

Caffe ROI Pooling的原始碼解析

1. LayerSetUp

template <typename Dtype>
void ROIPoolingLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  ROIPoolingParameter roi_pool_param = this->layer_param_.roi_pooling_param();
  //經過Pooling後的feature map的高
  pooled_height_ = roi_pool_param.pooled_h();
  //經過Pooling後的feature map的寬
  pooled_width_ = roi_pool_param.pooled_w();
  //輸入圖片與feature map之前的比值，這個feature map指roi pooling層的輸入
  spatial_scale_ = roi_pool_param.spatial_scale();
}

2. Reshape

template <typename Dtype>
void ROIPoolingLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  //輸入的feature map的channel數
  channels_ = bottom[0]->channels();
  //輸入的feature map的高
  height_ = bottom[0]->height();
  //輸入的feature map的寬
  width_ = bottom[0]->width();
  //設定輸出的形狀NCHW，N=ROI的個數，C=channels_，H=pooled_height_，W=pooled_width_
  top[0]->Reshape(bottom[1]->num(), channels_, pooled_height_,
      pooled_width_);
  //max_idx_的形狀與top一致
  max_idx_.Reshape(bottom[1]->num(), channels_, pooled_height_,
      pooled_width_);
}

3. Forward

template <typename Dtype>
void ROIPoolingLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  //輸入有兩部分組成，data和rois
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* bottom_rois = bottom[1]->cpu_data();
  // Number of ROIs
  int num_rois = bottom[1]->num();
  int batch_size = bottom[0]->num();
  int top_count = top[0]->count();
  Dtype* top_data = top[0]->mutable_cpu_data();
  caffe_set(top_count, Dtype(-FLT_MAX), top_data);
  int* argmax_data = max_idx_.mutable_cpu_data();
  caffe_set(top_count, -1, argmax_data);

  // For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
  for (int n = 0; n < num_rois; ++n) {
    int roi_batch_ind = bottom_rois[0];
    //把原圖的座標對映到feature map上面
    //通過roi候選框的x1,y1,x2,y2乘以縮放比得到當前層對應關係，fast-rcnn是1/16，因為是conv-5，進行進行四次pooling
    int roi_start_w = round(bottom_rois[1] * spatial_scale_);
    int roi_start_h = round(bottom_rois[2] * spatial_scale_);
    int roi_end_w = round(bottom_rois[3] * spatial_scale_);
    int roi_end_h = round(bottom_rois[4] * spatial_scale_);
    //計算每個roi在feature map上面的大小
    int roi_height = max(roi_end_h - roi_start_h + 1, 1);
    int roi_width = max(roi_end_w - roi_start_w + 1, 1);
    //pooling之後的feature map的一個值對應於pooling之前的feature map上的大小
    //注：由於roi的大小不一致，所以每次都需要計算一次
    //通過feature map的h,w除以輸出結果的h,w得到縮放比
    const Dtype bin_size_h = static_cast<Dtype>(roi_height)
                             / static_cast<Dtype>(pooled_height_);
    const Dtype bin_size_w = static_cast<Dtype>(roi_width)
                             / static_cast<Dtype>(pooled_width_);
    //找到對應的roi的feature map，如果input data的batch size為1
    //那麼roi_batch_ind=0
    const Dtype* batch_data = bottom_data + bottom[0]->offset(roi_batch_ind);
    //pooling的過程是針對每一個channel的，所以需要迴圈遍歷
    for (int c = 0; c < channels_; ++c) {
      //計算output的每一個值，所以需要遍歷一遍output，然後求出所有值
      for (int ph = 0; ph < pooled_height_; ++ph) {
        for (int pw = 0; pw < pooled_width_; ++pw) {
          // Compute pooling region for this output unit:
          //  start (included) = floor(ph * roi_height / pooled_height_)
          //  end (excluded) = ceil((ph + 1) * roi_height / pooled_height_)
          // 計算output上的一點對應於input上面區域的大小[hstart, wstart, hend, wend]
          // 通過寬高縮放比對每一個輸出點進行縮放對映
          int hstart = static_cast<int>(floor(static_cast<Dtype>(ph)
                                              * bin_size_h));
          int hend = static_cast<int>(ceil(static_cast<Dtype>(ph + 1)   //下一個畫素的對應對映值
                                           * bin_size_h));
          int wstart = static_cast<int>(floor(static_cast<Dtype>(pw)
                                              * bin_size_w));
          int wend = static_cast<int>(ceil(static_cast<Dtype>(pw + 1)
                                           * bin_size_w));
          //將對映後的區域平動到對應的位置[hstart, wstart, hend, wend]
          //當前值加上此框x1在conv5的對映開始座標，和輸入featrue map的高比，小的輸出，
          //對外面一層Min不太理解，感覺應該是hend-hstart和height_比較
          //此處得到的結果是相對於原圖縮放一定比例(fast_rcnn是1/16)從左上角原點開始計算的絕對值座標
          hstart = min(max(hstart + roi_start_h, 0), height_);
          hend = min(max(hend + roi_start_h, 0), height_);
          wstart = min(max(wstart + roi_start_w, 0), width_);
          wend = min(max(wend + roi_start_w, 0), width_);
          //如果對映後的矩形框不符合
          bool is_empty = (hend <= hstart) || (wend <= wstart);
          //pool_index指的是此時計算的output的值對應於output的位置
          const int pool_index = ph * pooled_width_ + pw;
          //如果矩形不符合，此處output的值設為0，此處的對應於輸入區域的最大值為-1
          if (is_empty) {
            top_data[pool_index] = 0;
            argmax_data[pool_index] = -1;
          }
          //遍歷output的值對應於input的區域塊
          //
          for (int h = hstart; h < hend; ++h) {
            for (int w = wstart; w < wend; ++w) {
             // 對應於input上的位置
              const int index = h * width_ + w;
              //計算區域塊的最大值，儲存在output對應的位置上
              //假設輸入featrue map是20x20，輸出是4x4的，那就是輸出一個點對應輸入25個點，縮放後就是輸出的點在這25箇中保留最大的那個，其他不要
              //同時記錄最大值的索引
              if (batch_data[index] > top_data[pool_index]) {
                top_data[pool_index] = batch_data[index];
                argmax_data[pool_index] = index;
              }
            }
          }
        }
      }
      // Increment all data pointers by one channel
      batch_data += bottom[0]->offset(0, 1);
      top_data += top[0]->offset(0, 1);
      argmax_data += max_idx_.offset(0, 1);
    }
    // Increment ROI data pointer
    bottom_rois += bottom[1]->offset(1);
  }
}

ROI Pooling層解析_程式碼原理詳細解釋和存在目的

ROI Pooling的輸入

ROI Pooling的輸出

Caffe ROI Pooling的原始碼解析

ROI Pooling層解析_程式碼原理詳細解釋和存在目的

ROI Pooling層解析

roi pooling層

關於RoI pooling 層

3分鐘理解ROI Pooling層

ROI Pooling層詳解

利用樸素貝葉斯分析鳶尾花，程式碼有詳細解釋。

庫存物資管理系統程式碼，詳細過程和總結

windows CMD命令大全及詳細解釋和語法

數字影象處理基礎知-色度空間(RGB\CMY\CMYK\HSI的詳細解釋和一些關聯性描述)

FFmpeg引數中文詳細解釋和FFmpeg常用基本命令

圖文+程式碼分析：caffe中全連線層、Pooling層、Relu層的反向傳播原理和實現

CDN原理詳細解析

ThreadLocal原理詳細解析

SAX解析示例程式碼和原理

【智慧演算法】粒子群演算法（Particle Swarm Optimization）超詳細解析+入門程式碼例項講解

《機器學習實戰》第2章閱讀筆記3 使用K近鄰演算法改進約會網站的配對效果—分步驟詳細講解1——資料準備：從文字檔案中解析資料（附詳細程式碼及註釋）

Java開發中業務層入參校驗詳細解析

史上最詳細的氣泡排序演算法解析（程式碼Java版本）

Android 圖片三級快取載入框架原理解析與程式碼實現

ROI Pooling層解析_程式碼原理詳細解釋和存在目的

ROI Pooling的輸入

ROI Pooling的輸出

Caffe ROI Pooling的原始碼解析

相關推薦