1. 程式人生 > >TensorFlow Object Detection API中的Faster R-CNN /SSD模型引數調整

TensorFlow Object Detection API中的Faster R-CNN /SSD模型引數調整

關於TensorFlow Object Detection API配置,可以參考之前的文章https://becominghuman.ai/tensorflow-object-detection-api-tutorial-training-and-evaluating-custom-object-detector-ed2594afcf73

在本文中,我將討論如何更改預訓練模型的配置。本文的目的是您可以根據您的應用程式配置TensorFlow/models,而API將不再是一個黑盒!

本文的概述:

  • 瞭解協議緩衝區和proto檔案。
  • 利用proto檔案知識,我們如何瞭解模型的配置檔案
  • 遵循3個步驟來更新模型的引數
  • 其他示例:
  1. 更改重量初始值設定項
  2. 更改體重優化器
  3. 評估預訓練模型

協議緩衝區

要修改模型,我們需要了解它的內部機制。TensorFlow物件檢測API使用協議緩衝區(Protocol Buffers),這是與語言無關,與平臺無關且可擴充套件的機制,用於序列化結構化資料。就像XML規模較小,但更快,更簡單。API使用協議緩衝區語言的proto2版本。我將嘗試解釋更新預配置模型所需的語言。有關協議緩衝區語言的更多詳細資訊,請參閱此文件和Python教程。

協議緩衝區的工作可分為以下三個步驟:

  • .proto檔案中定義訊息格式。該檔案的行為就像所有訊息的藍圖一樣,它顯示訊息所接受的所有引數是什麼,引數的資料型別應該是什麼,引數是必需的還是可選的,引數的標記號是什麼,什麼是引數的預設值等。API的protos檔案可在此處找到。為了理解,我使用grid_anchor_generator.proto檔案。
  • syntax = "proto2";
    
    package object_detection.protos;
    
    // Configuration proto for GridAnchorGenerator. See
    // anchor_generators/grid_anchor_generator.py for details.
    message GridAnchorGenerator {
       // Anchor height in pixels.
      optional int32 height = 1 [default = 256];
    
      // Anchor width in pixels.
      optional int32 width = 2 [default = 256];
    
      // Anchor stride in height dimension in pixels.
      optional int32 height_stride = 3 [default = 16];
    
      // Anchor stride in width dimension in pixels.
      optional int32 width_stride = 4 [default = 16];
    
      // Anchor height offset in pixels.
      optional int32 height_offset = 5 [default = 0];
    
      // Anchor width offset in pixels.
      optional int32 width_offset = 6 [default = 0];
    
      // At any given location, len(scales) * len(aspect_ratios) anchors are
      // generated with all possible combinations of scales and aspect ratios.
    
      // List of scales for the anchors.
      repeated float scales = 7;
    
      // List of aspect ratios for the anchors.
      repeated float aspect_ratios = 8;
    }

    它是從線30-33的引數明確scales,並aspect_ratios是強制性的訊息GridAnchorGenerator,而引數的其餘部分都是可選的,如果不通過,將採取預設值。

    • 定義訊息格式後,我們需要編譯協議緩衝區。該編譯器將從檔案生成類.proto檔案。在安裝API的過程中,我們運行了以下命令,該命令將編譯協議緩衝區:
    • # From tensorflow/models/research/
      protoc object_detection/protos/*.proto --python_out=.
      • 在定義和編譯協議緩衝區之後,我們需要使用Python協議緩衝區API來寫入和讀取訊息。在我們的例子中,我們可以將配置檔案視為協議緩衝區API,它可以在不考慮TensorFlow API的內部機制的情況下寫入和讀取訊息。換句話說,我們可以通過適當地更改配置檔案來更新預訓練模型的引數。
      • 瞭解配置檔案

        顯然,配置檔案可以幫助我們根據需要更改模型的引數。彈出的下一個問題是如何更改模型的引數?本節和下一部分將回答這個問題,在這裡proto檔案的知識將很方便。出於演示目的,我正在使用faster_rcnn_resnet50_pets.config檔案。

      • # Faster R-CNN with Resnet-50 (v1), configured for Oxford-IIIT Pets Dataset.
        # Users should configure the fine_tune_checkpoint field in the train config as
        # well as the label_map_path and input_path fields in the train_input_reader and
        # eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
        # should be configured.
        
        model {
          faster_rcnn {
            num_classes: 37
            image_resizer {
              keep_aspect_ratio_resizer {
                min_dimension: 600
                max_dimension: 1024
              }
            }
            feature_extractor {
              type: 'faster_rcnn_resnet50'
              first_stage_features_stride: 16
            }
            first_stage_anchor_generator {
              grid_anchor_generator {
                scales: [0.25, 0.5, 1.0, 2.0]
                aspect_ratios: [0.5, 1.0, 2.0]
                height_stride: 16
                width_stride: 16
              }
            }
            first_stage_box_predictor_conv_hyperparams {
              op: CONV
              regularizer {
                l2_regularizer {
                  weight: 0.0
                }
              }
              initializer {
                truncated_normal_initializer {
                  stddev: 0.01
                }
              }
            }
            first_stage_nms_score_threshold: 0.0
            first_stage_nms_iou_threshold: 0.7
            first_stage_max_proposals: 300
            first_stage_localization_loss_weight: 2.0
            first_stage_objectness_loss_weight: 1.0
            initial_crop_size: 14
            maxpool_kernel_size: 2
            maxpool_stride: 2
            second_stage_box_predictor {
              mask_rcnn_box_predictor {
                use_dropout: false
                dropout_keep_probability: 1.0
                fc_hyperparams {
                  op: FC
                  regularizer {
                    l2_regularizer {
                      weight: 0.0
                    }
                  }
                  initializer {
                    variance_scaling_initializer {
                      factor: 1.0
                      uniform: true
                      mode: FAN_AVG
                    }
                  }
                }
              }
            }
            second_stage_post_processing {
              batch_non_max_suppression {
                score_threshold: 0.0
                iou_threshold: 0.6
                max_detections_per_class: 100
                max_total_detections: 300
              }
              score_converter: SOFTMAX
            }
            second_stage_localization_loss_weight: 2.0
            second_stage_classification_loss_weight: 1.0
          }
        }
        
        train_config: {
          batch_size: 1
          optimizer {
            momentum_optimizer: {
              learning_rate: {
                manual_step_learning_rate {
                  initial_learning_rate: 0.0003
                  schedule {
                    step: 900000
                    learning_rate: .00003
                  }
                  schedule {
                    step: 1200000
                    learning_rate: .000003
                  }
                }
              }
              momentum_optimizer_value: 0.9
            }
            use_moving_average: false
          }
          gradient_clipping_by_norm: 10.0
          fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
          from_detection_checkpoint: true
          # Note: The below line limits the training process to 200K steps, which we
          # empirically found to be sufficient enough to train the pets dataset. This
          # effectively bypasses the learning rate schedule (the learning rate will
          # never decay). Remove the below line to train indefinitely.
          num_steps: 200000
          data_augmentation_options {
            random_horizontal_flip {
            }
          }
          max_number_of_boxes: 50
        }
        
        train_input_reader: {
          tf_record_input_reader {
            input_path: "PATH_TO_BE_CONFIGURED/pet_train.record"
          }
          label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
        }
        
        eval_config: {
          num_examples: 2000
          # Note: The below line limits the evaluation process to 10 evaluations.
          # Remove the below line to evaluate indefinitely.
          max_evals: 10
        }
        
        eval_input_reader: {
          tf_record_input_reader {
            input_path: "PATH_TO_BE_CONFIGURED/pet_val.record"
          }
          label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
          shuffle: false
          num_readers: 1
        }

        第7至10行表示這num_classesfaster_rcnnmessage 的引數之一,而後者又是message的引數model。同樣,optimizer是父train_config訊息的子訊息,而message的batch_size另一個引數train_config。我們可以通過簽出相應的protos檔案來驗證這一點。

      • syntax = "proto2";
        
        package object_detection.protos;
        
        import "object_detection/protos/anchor_generator.proto";
        import "object_detection/protos/box_predictor.proto";
        import "object_detection/protos/hyperparams.proto";
        import "object_detection/protos/image_resizer.proto";
        import "object_detection/protos/losses.proto";
        import "object_detection/protos/post_processing.proto";
        
        // Configuration for Faster R-CNN models.
        // See meta_architectures/faster_rcnn_meta_arch.py and models/model_builder.py
        //
        // Naming conventions:
        // Faster R-CNN models have two stages: a first stage region proposal network
        // (or RPN) and a second stage box classifier.  We thus use the prefixes
        // `first_stage_` and `second_stage_` to indicate the stage to which each
        // parameter pertains when relevant.
        message FasterRcnn {
        
          // Whether to construct only the Region Proposal Network (RPN).
          optional int32 number_of_stages = 1 [default=2];
        
          // Number of classes to predict.
          optional int32 num_classes = 3;
          
          // Image resizer for preprocessing the input image.
          optional ImageResizer image_resizer = 4;

        從第20行和第26行可以明顯看出,這num_classesoptional訊息的引數之一faster_rcnn。我希望到目前為止的討論有助於理解配置檔案的組織。現在,是時候正確更新模型的引數之一了。

      • 步驟1:確定要更新的引數

        假設我們需要更新fast_rcnn_resnet50_pets.config檔案的image_resizer第10行中提到的引數。

        步驟2:在儲存庫中搜索給定引數

        目標是找到proto引數檔案。為此,我們需要在儲存庫中搜索。

      •  

         我們需要搜尋以下程式碼:

      • parameter_name path:research/object_detection/protos
        #in our case parameter_name="image_resizer" thus,
        image_resizer path:research/object_detection/protos

        在此path:research/object_detection/protos限制搜尋域。在此處可以找到有關如何在GitHub上搜索的更多資訊。搜尋的輸出image_resizer path:research/object_detection/protos如下所示:

      •  

        從輸出中很明顯,要更新image_resizer引數,我們需要分析image_resizer.proto檔案。

        步驟3:分析proto檔案

         

        syntax = "proto2";
        
        package object_detection.protos;
        
        // Configuration proto for image resizing operations.
        // See builders/image_resizer_builder.py for details.
        message ImageResizer {
          oneof image_resizer_oneof {
            KeepAspectRatioResizer keep_aspect_ratio_resizer = 1;
            FixedShapeResizer fixed_shape_resizer = 2;
          }
        }
        
        // Enumeration type for image resizing methods provided in TensorFlow.
        enum ResizeType {
          BILINEAR = 0; // Corresponds to tf.image.ResizeMethod.BILINEAR
          NEAREST_NEIGHBOR = 1; // Corresponds to tf.image.ResizeMethod.NEAREST_NEIGHBOR
          BICUBIC = 2; // Corresponds to tf.image.ResizeMethod.BICUBIC
          AREA = 3; // Corresponds to tf.image.ResizeMethod.AREA
        }
        
        // Configuration proto for image resizer that keeps aspect ratio.
        message KeepAspectRatioResizer {
          // Desired size of the smaller image dimension in pixels.
          optional int32 min_dimension = 1 [default = 600];
        
          // Desired size of the larger image dimension in pixels.
          optional int32 max_dimension = 2 [default = 1024];
        
          // Desired method when resizing image.
          optional ResizeType resize_method = 3 [default = BILINEAR];
        
          // Whether to pad the image with zeros so the output spatial size is
          // [max_dimension, max_dimension]. Note that the zeros are padded to the
          // bottom and the right of the resized image.
          optional bool pad_to_max_dimension = 4 [default = false];
        
          // Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
          optional bool convert_to_grayscale = 5 [default = false];
        
          // Per-channel pad value. This is only used when pad_to_max_dimension is True.
          // If unspecified, a default pad value of 0 is applied to all channels.
          repeated float per_channel_pad_value = 6;
        }
        
        // Configuration proto for image resizer that resizes to a fixed shape.
        message FixedShapeResizer {
          // Desired height of image in pixels.
          optional int32 height = 1 [default = 300];
        
          // Desired width of image in pixels.
          optional int32 width = 2 [default = 300];
        
          // Desired method when resizing image.
          optional ResizeType resize_method = 3 [default = BILINEAR];
        
          // Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
          optional bool convert_to_grayscale = 4 [default = false];
        }

        從第8-10行可以看出,我們可以使用keep_aspect_ratio_resizer或調整影象的大小fixed_shape_resizer。在分析行23-44,我們可以觀察到的訊息keep_aspect_ratio_resizer有引數:min_dimensionmax_dimensionresize_methodpad_to_max_dimensionconvert_to_grayscale,和per_channel_pad_value。此外,fixed_shape_resizer有引數:heightwidthresize_method,和convert_to_grayscaleproto檔案中提到了所有引數的資料型別。因此,要更改image_resizer型別,我們可以在配置檔案中更改以下幾行。

      • #before
        image_resizer {
        keep_aspect_ratio_resizer {
        min_dimension: 600 
        max_dimension: 1024
            }
        }
        #after
        image_resizer {
        fixed_shape_resizer {
        height: 600
        width: 500
        resize_method: AREA
          }
        }

        上面的程式碼將使用AREA調整大小方法將影象調整為500 * 600。TensorFlow中可用的各種調整大小的方法可以在這裡找到。

      • 其他例子

        我們可以使用上一節中討論的步驟更新/新增任何引數。我將在此處演示一些經常使用的示例,但是上面討論的步驟可能有助於更新/新增模型的任何引數。

        更改重量初始化器

        • 決定更改fast_rcnn_resnet50_pets.config檔案的initializer第35行的引數。
        • initializer path:research/object_detection/protos在儲存庫中搜索。根據搜尋結果,很明顯我們需要分析hyperparams.proto檔案。
          • hyperparams.proto檔案中的第68–74行說明了initializer配置。
          • message Initializer {
              oneof initializer_oneof {
                TruncatedNormalInitializer truncated_normal_initializer = 1;
                VarianceScalingInitializer variance_scaling_initializer = 2;
                RandomNormalInitializer random_normal_initializer = 3;
              }
            }

            我們可以使用random_normal_intializer代替truncated_normal_initializer,因為我們需要分析hyperparams.proto檔案中的第99–102行。

          • message RandomNormalInitializer {
            optional float mean = 1 [default = 0.0];
            optional float stddev = 2 [default = 1.0];
            }
          • 顯然random_normal_intializer有兩個引數meanstddev。我們可以將配置檔案中的以下幾行更改為use random_normal_intializer
          • #before
            initializer {
                truncated_normal_initializer {
                    stddev: 0.01
                   }
            }
            #after
            initializer {
                random_normal_intializer{
                   mean: 1 
                   stddev: 0.5
                   }
            }

            更改體重優化器

            • 決定更改faster_rcnn_resnet50_pets.config檔案的第87行momentum_optimizer的父訊息的引數。optimizer
            • optimizer path:research/object_detection/protos在儲存庫中搜索。根據搜尋結果,很明顯我們需要分析optimizer.proto檔案。
              • optimizer.proto檔案中的9-14行,解釋optimizer配置。

               

              message Optimizer {
                oneof optimizer {
                  RMSPropOptimizer rms_prop_optimizer = 1;
                  MomentumOptimizer momentum_optimizer = 2;
                  AdamOptimizer adam_optimizer = 3;
                }

              顯然,代替momentum_optimizer我們可以使用adam_optimizer已被證明是良好的優化程式。為此,我們需要在f aster_rcnn_resnet50_pets.config檔案中進行以下更改。

           

          #before
          optimizer {  
            momentum_optimizer: {
                learning_rate: {
                     manual_step_learning_rate {
                    initial_learning_rate: 0.0003
                    schedule {
                      step: 900000
                      learning_rate: .00003
                    }
                    schedule {
                      step: 1200000
                      learning_rate: .000003
                    }
                  }
                }
                momentum_optimizer_value: 0.9
              }
          #after
          optimizer {
            adam_optimizer: {
                learning_rate: {
                 manual_step_learning_rate {
                    initial_learning_rate: 0.0003
                    schedule {
                      step: 900000
                      learning_rate: .00003
                    }
                    schedule {
                      step: 1200000
                      learning_rate: .000003
                    }
                  }
                }
              }

          評估預訓練模型

          Eval等待300秒,以檢查訓練模型是否已更新!如果您的GPU不錯,那麼您可以同時進行訓練和評估!通常,資源將被耗盡。為了克服這個問題,我們可以先訓練模型,將其儲存在目錄中,然後再評估模型。為了稍後進行評估,我們需要在配置檔案中進行以下更改:

        • #Before
          eval_config: {
            num_examples: 2000
            # Note: The below line limits the evaluation process to 10 evaluations.
            # Remove the below line to evaluate indefinitely.
            max_evals: 10
          }
          #after
          eval_config: {
          num_examples: 10
          num_visualizations: 10
          eval_interval_secs: 0
          }

          num_visualizations應該等於要評估的數量!視覺化的數量越多,評估所需的時間就越多。如果您的GPU具有足夠的能力同時進行訓練和評估,則可以保留eval_interval_secs: 300。此引數決定執行評估的頻率。我按照上面討論的3個步驟得出了這個結論。

          簡而言之,協議緩衝區的知識幫助我們理解了模型引數是以訊息形式傳遞的,並且可以更新我們可以引用的.proto檔案的引數。討論了3個簡單的步驟來找到.proto用於更新引數的正確檔案。

          請在註釋的配置檔案中提及您要更新/新增的任何引數。

        • 關注【OpenCV與AI深度學習】獲得更多資訊

          掃描下面二維碼即可關注