1. 程式人生 > >GluonCV:用Pascal VOC資料訓練YOLO v3(下)訓練部分

GluonCV:用Pascal VOC資料訓練YOLO v3(下)訓練部分

本教程介紹了訓練GluonCV提供的YOLOv3目標檢測模型的基本步驟。

具體來說,展示瞭如何通過堆疊GluonCV元件來構建state-of-the-art的YOLOv3模型。 

首先,關於訓練有三點說明:

(1)初始學習率預設是0.001,我訓練loss是nan,改為0.0001解決;

(2)如果八卡1080Ti,預設batch-size=64即可,四卡改成32,其他依次類推;

(3)關於下面的情況

[07:40:55] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the bestconvolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0to disable)

# 解決方法
# 執行train之前,輸入
export MXNET_CUDNN_AUTOTUNE_DEFAULT = 0

1.資料集

請首先閱讀上篇教程,在磁碟上準備好Pascal VOC資料集。

然後,我們準備載入訓練和驗證影象。 

import gluoncv as gcv
from gluoncv.data import VOCDetection
# typically we use 2007+2012 trainval splits for training data
train_dataset = VOCDetection(splits=[(2007, 'trainval'), (2012, 'trainval')])
# and use 2007 test as validation data
val_dataset = VOCDetection(splits=[(2007, 'test')])

print('Training images:', len(train_dataset))
print('Validation images:', len(val_dataset))


Out:

Training images: 16551
Validation images: 4952

2. Data transform

我們可以從訓練資料集中讀取影象-標籤對 

train_image, train_label = train_dataset[80]
bboxes = train_label[:, :4]
cids = train_label[:, 4:5]
print('image:', train_image.shape)
print('bboxes:', bboxes.shape, 'class ids:', cids.shape)

Out:
image: (375, 500, 3)
bboxes: (2, 4) class ids: (2, 1)

用matplotlib繪製圖像以及邊界框標籤: 

from matplotlib import pyplot as plt
from gluoncv.utils import viz

ax = viz.plot_bbox(train_image.asnumpy(), bboxes, labels=cids, class_names=train_dataset.classes)
plt.show()

驗證影象與訓練非常相似,因為它們基本上是隨機分成不同的集合 

val_image, val_label = val_dataset[120]
bboxes = val_label[:, :4]
cids = val_label[:, 4:5]
ax = viz.plot_bbox(val_image.asnumpy(), bboxes, labels=cids, class_names=train_dataset.classes)
plt.show()

Transform

from gluoncv.data.transforms import presets
from gluoncv import utils
from mxnet import nd

width, height = 416, 416  # resize image to 416x416 after all data augmentation
train_transform = presets.yolo.YOLO3DefaultTrainTransform(width, height)
val_transform = presets.yolo.YOLO3DefaultValTransform(width, height)

utils.random.seed(123)  # fix seed in this tutorial

將變換應用於訓練影象 

train_image2, train_label2 = train_transform(train_image, train_label)
print('tensor shape:', train_image2.shape)


Out:

tensor shape: (3, 416, 416)

張量中的影象被扭曲,因為它們不再位於(0,255)範圍內。 讓我們把它們轉換回去,這樣我們就能清楚地看到它們。

train_image2 = train_image2.transpose((1, 2, 0)) * nd.array((0.229, 0.224, 0.225)) + nd.array((0.485, 0.456, 0.406))
train_image2 = (train_image2 * 255).clip(0, 255)
ax = viz.plot_bbox(train_image2.asnumpy(), train_label2[:, :4],
                   labels=train_label2[:, 4:5],
                   class_names=train_dataset.classes)
plt.show()

訓練中使用的變換包括隨機顏色失真,隨機擴充套件/裁剪,隨機翻轉,調整大小和固定顏色標準化。 相比之下,驗證僅涉及調整大小和顏色標準化。 

3.Data Loader

我們將在訓練期間多次遍歷整個資料集。 請記住,在將原始影象輸入神經網路之前,必須將原始影象轉換為張量(mxnet使用BCHW格式)。 

一個方便的DataLoader非常方便我們將不同的轉換和聚合資料應用到mini-batches中。 

因為目標的數量在影象間變化很大,所以我們也有不同的標籤大小。 因此,我們需要將這些標籤填充到相同的大小。 為了解決這個問題,GluonCV提供了gluoncv.data.batchify.Pad,它可以自動處理填充。 還有gluoncv.data.batchify.Stack,用於堆疊具有一致形狀的NDArrays。 gluoncv.data.batchify.Tuple用於處理來自轉換函式的多個輸出的不同行為。 

from gluoncv.data.batchify import Tuple, Stack, Pad
from mxnet.gluon.data import DataLoader

batch_size = 2  # for tutorial, we use smaller batch-size
num_workers = 0  # you can make it larger(if your CPU has more cores) to accelerate data loading

# behavior of batchify_fn: stack images, and pad labels
batchify_fn = Tuple(Stack(), Pad(pad_val=-1))
train_loader = DataLoader(train_dataset.transform(train_transform), batch_size, shuffle=True,
                          batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)
val_loader = DataLoader(val_dataset.transform(val_transform), batch_size, shuffle=False,
                        batchify_fn=batchify_fn, last_batch='keep', num_workers=num_workers)

for ib, batch in enumerate(train_loader):
    if ib > 3:
        break
    print('data 0:', batch[0][0].shape, 'label 0:', batch[1][0].shape)
    print('data 1:', batch[0][1].shape, 'label 1:', batch[1][1].shape)


Out:

data 0: (3, 416, 416) label 0: (6, 6)
data 1: (3, 416, 416) label 1: (6, 6)
data 0: (3, 416, 416) label 0: (3, 6)
data 1: (3, 416, 416) label 1: (3, 6)
data 0: (3, 416, 416) label 0: (2, 6)
data 1: (3, 416, 416) label 1: (2, 6)
data 0: (3, 416, 416) label 0: (2, 6)
data 1: (3, 416, 416) label 1: (2, 6)

4. YOLOv3 Network

GluonCV的YOLOv3實現是綜合的Gluon HybridBlock。 在結構方面,YOLOv3網路由基本特徵提取網路,卷積過渡層,上取樣層和專門設計的YOLOv3輸出層組成。 

Gluon Model Zoo有一些內建的YOLO網路,可以使用一行簡單的程式碼載入: 

(為避免在本教程中下載mdoel,我們設定pretrained_base = False,實際上我們通常希望通過設定pretrained_base = True來載入預先訓練的imagenet模型。)

from gluoncv import model_zoo
net = model_zoo.get_model('yolo3_darknet53_voc', pretrained_base=False)
print(net)


Out:

YOLOV3(
  (_target_generator): YOLOV3TargetMerger(
    (_dynamic_target): YOLOV3DynamicTargetGeneratorSimple(
      (_batch_iou): BBoxBatchIOU(
        (_pre): BBoxSplit(
        
        )
      )
    )
  )
  (_loss): YOLOV3Loss(batch_axis=0, w=None)
  (stages): HybridSequential(
    (0): HybridSequential(
      (0): HybridSequential(
        (0): Conv2D(None -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
      (1): HybridSequential(
        (0): Conv2D(None -> 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
      (2): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (3): HybridSequential(
        (0): Conv2D(None -> 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
      (4): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (5): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (6): HybridSequential(
        (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
      (7): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (8): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (9): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (10): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (11): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (12): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (13): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (14): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
    )
    (1): HybridSequential(
      (0): HybridSequential(
        (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
      (1): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (2): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (3): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (4): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (5): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (6): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (7): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (8): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
    )
    (2): HybridSequential(
      (0): HybridSequential(
        (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
      (1): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (2): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (3): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
      (4): DarknetBasicBlockV3(
        (body): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
            (2): LeakyReLU(0.1)
          )
        )
      )
    )
  )
  (transitions): HybridSequential(
    (0): HybridSequential(
      (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
      (2): LeakyReLU(0.1)
    )
    (1): HybridSequential(
      (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
      (2): LeakyReLU(0.1)
    )
  )
  (yolo_blocks): HybridSequential(
    (0): YOLODetectionBlockV3(
      (body): HybridSequential(
        (0): HybridSequential(
          (0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (1): HybridSequential(
          (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (2): HybridSequential(
          (0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (3): HybridSequential(
          (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (4): HybridSequential(
          (0): Conv2D(None -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
      )
      (tip): HybridSequential(
        (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
    )
    (1): YOLODetectionBlockV3(
      (body): HybridSequential(
        (0): HybridSequential(
          (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (1): HybridSequential(
          (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (2): HybridSequential(
          (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (3): HybridSequential(
          (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (4): HybridSequential(
          (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
      )
      (tip): HybridSequential(
        (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
    )
    (2): YOLODetectionBlockV3(
      (body): HybridSequential(
        (0): HybridSequential(
          (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (1): HybridSequential(
          (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (2): HybridSequential(
          (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (3): HybridSequential(
          (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
        (4): HybridSequential(
          (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
          (2): LeakyReLU(0.1)
        )
      )
      (tip): HybridSequential(
        (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=None)
        (2): LeakyReLU(0.1)
      )
    )
  )
  (yolo_outputs): HybridSequential(
    (0): YOLOOutputV3(
      (prediction): Conv2D(None -> 75, kernel_size=(1, 1), stride=(1, 1))
    )
    (1): YOLOOutputV3(
      (prediction): Conv2D(None -> 75, kernel_size=(1, 1), stride=(1, 1))
    )
    (2): YOLOOutputV3(
      (prediction): Conv2D(None -> 75, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)

YOLOv3網路可以使用影象張量進行呼叫 

import mxnet as mx
x = mx.nd.zeros(shape=(1, 3, 416, 416))
net.initialize()
cids, scores, bboxes = net(x)

YOLOv3返回三個值,其中cids是類標籤,scores是每個預測的置信度分數,而bbox是相應邊界框的絕對座標。 

5. 訓練目標

端到端YOLOv3訓練涉及四個損失。 損失懲罰不正確的class/box預測,並在gluoncv.loss.YOLOV3Loss中定義 

loss = gcv.loss.YOLOV3Loss()
# which is already included in YOLOv3 network
print(net._loss)
Out:

YOLOV3Loss(batch_axis=0, w=None)

為了加速訓練,我們讓CPU預先計算一些訓練目標。 當CPU功能強大且可以使用  -j num_workers來使用多核CPU時,這一點尤為出色。 

如果我們為訓練變換函式提供網路,它將計算部分訓練目標 

from mxnet import autograd
train_transform = presets.yolo.YOLO3DefaultTrainTransform(width, height, net)
# return stacked images, center_targets, scale_targets, gradient weights, objectness_targets, class_targets
# additionally, return padded ground truth bboxes, so there are 7 components returned by dataloader
batchify_fn = Tuple(*([Stack() for _ in range(6)] + [Pad(axis=0, pad_val=-1) for _ in range(1)]))
train_loader = DataLoader(train_dataset.transform(train_transform), batch_size, shuffle=True,
                          batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)

for ib, batch in enumerate(train_loader):
    if ib > 0:
        break
    print('data:', batch[0][0].shape)
    print('label:', batch[6][0].shape)
    with autograd.record():
        input_order = [0, 6, 1, 2, 3, 4, 5]
        obj_loss, center_loss, scale_loss, cls_loss = net(*[batch[o] for o in input_order])
        # sum up the losses
        # some standard gluon training steps:
        # autograd.backward(sum_loss)
        # trainer.step(batch_size)


Out:

data: (3, 416, 416)
label: (4, 4)

我們可以看到data loader實際上正在為我們返回訓練目標。 然後載入資料和Gluon訓練很自然地迴圈,並讓它更新權重。 

參考資料

  1. Redmon, Joseph, and Ali Farhadi. “Yolov3: An incremental improvement.” arXiv preprint arXiv:1804.02767 (2018).