1. 程式人生 > >caffe學習筆記2:Fine-tuning一個類別識別的預處理的網

caffe學習筆記2:Fine-tuning一個類別識別的預處理的網

Fine-tuning一個預處理的網用於型別識別(Fine-tuning a Pretrained Network for Style Recognition)

本文原文地址here

在這個實驗中,我們我們探索一個通用的方法,這個方法在現實世界的應用中非常的有用:使用一個提前訓練的caffe網路,並且使用自定義的資料來fine-tune引數。
這個方法的優點就是,提前訓練好的我那個落是從一個非常大的影象資料集中學習到的,中層的網可以捕獲一般視覺表象的語義資訊。考慮把它作為十分強大的通用的視覺化特徵,這個特徵你可以看作是一個黑盒。最重要的是(on top of that),需要一些相對少量的資料在目標任務上有一個很好的表現。
首先,我們需要準備資料。這個包含下面幾個步驟:
1. 通過提供的shell scripts得到ImageNet ilsvrc預訓練的模型。
2. 下載Flickr style資料集的一個子集用於這個demo。
3. 編譯下載的Flickr資料到caffe是資料格式。

caffe_root = '../'  # this file should be run from {caffe_root}/examples (otherwise change this line)

import sys
sys.path.insert(0, caffe_root + 'python')
import caffe

caffe.set_device(0)
caffe.set_mode_gpu()

import numpy as np
from pylab import *
%matplotlib inline
import tempfile

# Helper function for deprocessing preprocessed images, e.g., for display.
def deprocess_net_image(image): image = image.copy() # don't modify destructively image = image[::-1] # BGR -> RGB image = image.transpose(1, 2, 0) # CHW -> HWC image += [123, 117, 104] # (approximately) undo mean subtraction # clamp values in [0, 255]
image[image < 0], image[image > 255] = 0, 255 # round and cast from float32 to uint8 image = np.round(image) image = np.require(image, dtype=np.uint8) return image

1、建立和資料集下載

下載資料要求這些執行

  • get_ilsvrc_aux.sh去下載ImageNet資料均值和標籤等
  • download_model_binary.py去下載提前訓練好的參考模型
  • finetune_flickr_style/assemble_data.py下載style訓練和測試資料

我們執行這些之後將下載所有資料集中的一小部分:從80k的影象中僅僅獲得2000幅影象,從20個style類別中獲得了5個。(為了獲取所有的資料集可以在接下來的設定中設定full_dataset=True)

# Download just a small subset of the data for this exercise.
# (2000 of 80K images, 5 of 20 labels.)
# To download the entire dataset, set `full_dataset = True`.
full_dataset = False
if full_dataset:
    NUM_STYLE_IMAGES = NUM_STYLE_LABELS = -1
else:
    NUM_STYLE_IMAGES = 2000
    NUM_STYLE_LABELS = 5

# This downloads the ilsvrc auxiliary data (mean file, etc),
# and a subset of 2000 images for the style recognition task.
import os
os.chdir(caffe_root)  # run scripts from caffe root
!data/ilsvrc12/get_ilsvrc_aux.sh
!scripts/download_model_binary.py models/bvlc_reference_caffenet
!python examples/finetune_flickr_style/assemble_data.py \
    --workers=-1  --seed=1701 \
    --images=$NUM_STYLE_IMAGES  --label=$NUM_STYLE_LABELS
# back to examples
os.chdir('examples')

Downloading…
–2016-02-24 00:28:36– http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
Resolving dl.caffe.berkeleyvision.org (dl.caffe.berkeleyvision.org)… 169.229.222.251
Connecting to dl.caffe.berkeleyvision.org (dl.caffe.berkeleyvision.org)|169.229.222.251|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 17858008 (17M) [application/octet-stream]
Saving to: ‘caffe_ilsvrc12.tar.gz’
100%[======================================>] 17,858,008 112MB/s in 0.2s
2016-02-24 00:28:36 (112 MB/s) - ‘caffe_ilsvrc12.tar.gz’ saved [17858008/17858008]
Unzipping…
Done.
Model already exists.
Downloading 2000 images with 7 workers…
Writing train/val for 1996 successfully downloaded images.

定義weights,設定我們剛剛下載過的ImageNet預訓練的權值路徑,確保它存在。

import os
weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'
assert os.path.exists(weights)

ilsvrc12/synset_words.txt載入1000幅ImageNet標籤,並且從finetune_flickr_style/style_names.txt載入5個style標籤。

# Load ImageNet labels to imagenet_labels
imagenet_label_file = caffe_root + 'data/ilsvrc12/synset_words.txt'
imagenet_labels = list(np.loadtxt(imagenet_label_file, str, delimiter='\t'))
assert len(imagenet_labels) == 1000
print 'Loaded ImageNet labels:\n', '\n'.join(imagenet_labels[:10] + ['...'])

# Load style labels to style_labels
style_label_file = caffe_root + 'examples/finetune_flickr_style/style_names.txt'
style_labels = list(np.loadtxt(style_label_file, str, delimiter='\n'))
if NUM_STYLE_LABELS > 0:
    style_labels = style_labels[:NUM_STYLE_LABELS]
print '\nLoaded style labels:\n', ', '.join(style_labels)

2、定義和執行網路

我們開始通過定義caffenet,一個函式來初始化CaffeNet結構(一個在AlexNet的小的變種),以引數指定資料和輸出類的數量。

from caffe import layers as L
from caffe import params as P

weight_param = dict(lr_mult=1, decay_mult=1)
bias_param   = dict(lr_mult=2, decay_mult=0)
learned_param = [weight_param, bias_param]

frozen_param = [dict(lr_mult=0)] * 2

def conv_relu(bottom, ks, nout, stride=1, pad=0, group=1,
              param=learned_param,
              weight_filler=dict(type='gaussian', std=0.01),
              bias_filler=dict(type='constant', value=0.1)):
    conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
                         num_output=nout, pad=pad, group=group,
                         param=param, weight_filler=weight_filler,
                         bias_filler=bias_filler)
    return conv, L.ReLU(conv, in_place=True)

def fc_relu(bottom, nout, param=learned_param,
            weight_filler=dict(type='gaussian', std=0.005),
            bias_filler=dict(type='constant', value=0.1)):
    fc = L.InnerProduct(bottom, num_output=nout, param=param,
                        weight_filler=weight_filler,
                        bias_filler=bias_filler)
    return fc, L.ReLU(fc, in_place=True)

def max_pool(bottom, ks, stride=1):
    return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)

def caffenet(data, label=None, train=True, num_classes=1000,
             classifier_name='fc8', learn_all=False):
    """Returns a NetSpec specifying CaffeNet, following the original proto text
       specification (./models/bvlc_reference_caffenet/train_val.prototxt)."""
    n = caffe.NetSpec()
    n.data = data
    param = learned_param if learn_all else frozen_param
    n.conv1, n.relu1 = conv_relu(n.data, 11, 96, stride=4, param=param)
    n.pool1 = max_pool(n.relu1, 3, stride=2)
    n.norm1 = L.LRN(n.pool1, local_size=5, alpha=1e-4, beta=0.75)
    n.conv2, n.relu2 = conv_relu(n.norm1, 5, 256, pad=2, group=2, param=param)
    n.pool2 = max_pool(n.relu2, 3, stride=2)
    n.norm2 = L.LRN(n.pool2, local_size=5, alpha=1e-4, beta=0.75)
    n.conv3, n.relu3 = conv_relu(n.norm2, 3, 384, pad=1, param=param)
    n.conv4, n.relu4 = conv_relu(n.relu3, 3, 384, pad=1, group=2, param=param)
    n.conv5, n.relu5 = conv_relu(n.relu4, 3, 256, pad=1, group=2, param=param)
    n.pool5 = max_pool(n.relu5, 3, stride=2)
    n.fc6, n.relu6 = fc_relu(n.pool5, 4096, param=param)
    if train:
        n.drop6 = fc7input = L.Dropout(n.relu6, in_place=True)
    else:
        fc7input = n.relu6
    n.fc7, n.relu7 = fc_relu(fc7input, 4096, param=param)
    if train:
        n.drop7 = fc8input = L.Dropout(n.relu7, in_place=True)
    else:
        fc8input = n.relu7
    # always learn fc8 (param=learned_param)
    fc8 = L.InnerProduct(fc8input, num_output=num_classes, param=learned_param)
    # give fc8 the name specified by argument `classifier_name`
    n.__setattr__(classifier_name, fc8)
    if not train:
        n.probs = L.Softmax(fc8)
    if label is not None:
        n.label = label
        n.loss = L.SoftmaxWithLoss(fc8, n.label)
        n.acc = L.Accuracy(fc8, n.label)
    # write the net to a temporary file and return its filename
    with tempfile.NamedTemporaryFile(delete=False) as f:
        f.write(str(n.to_proto()))
        return f.name

現在,讓我們建立一個CaffeNet網路,把沒有標籤的“dummy data”作為輸入,允許我們從外部去設定輸入影象,並且看看預測成ImageNet的什麼類別。

dummy_data = L.DummyData(shape=dict(dim=[1, 3, 227, 227]))
imagenet_net_filename = caffenet(data=dummy_data, train=False)
imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST)

定義一個函式style_net這會呼叫caffenet

這個新的網路也會有CaffeNet的結構,輸入和輸出不同。

  • 輸入是我們下載的Flickr style,被一個ImageData層所輸入。
  • 輸出是一個20類的分佈而不是原始1000個imageNet類
  • 分類層被重新命名為fc8_flickr代替fc8,告訴Caffe不是從預訓練的ImageNet模型載入原始的分類(fc8)權值。
def style_net(train=True, learn_all=False, subset=None):
    if subset is None:
        subset = 'train' if train else 'test'
    source = caffe_root + 'data/flickr_style/%s.txt' % subset
    transform_param = dict(mirror=train, crop_size=227,
        mean_file=caffe_root + 'data/ilsvrc12/imagenet_mean.binaryproto')
    style_data, style_label = L.ImageData(
        transform_param=transform_param, source=source,
        batch_size=50, new_height=256, new_width=256, ntop=2)
    return caffenet(data=style_data, label=style_label, train=train,
                    num_classes=NUM_STYLE_LABELS,
                    classifier_name='fc8_flickr',
                    learn_all=learn_all)