1. 程式人生 > >Convolutional Neural Networks: Part 1

Convolutional Neural Networks: Part 1

For the first section, check out: Python Deep Learning

Training a Convolutional Network from Scratch (pun totally intended)

Having to train an image-classification model using very little data is a common situation, which you’ll likely encounter in practice if you ever do computer vision in a professional context. A “few” samples can mean anywhere from a few hundred to a few tens of thousands of images. As a practical example, we’ll focus on classifying images as dogs or cats, in a dataset containing 4,000 pictures of cats and dogs (2,000 cats, 2,000 dogs). We’ll use 2,000 pictures for training — 1,000 for validation, and 1,000 for testing.

In this part, we’ll review one basic strategy to tackle this problem: training a new model from scratch using what little data you have. You’ll start by naively training a small CNN on the 2,000 training samples, without any regularization, to set a baseline for what can be achieved. This will get you to a classification accuracy of ~71%. At that point, the main issue will be overfitting. Then we’ll introduce data augmentation

, a powerful technique for mitigating overfitting in computer vision. By using data augmentation, you’ll improve the network to reach an accuracy of ~82%.

In the next part, we’ll review two more essential techniques for applying deep learning to small datasets: feature extraction with a pretrained network

(which will get you to an accuracy of ~90% to ~96%) and fine-tuning a pretrained network (this will get you to a final accuracy of ~97%). Together, these three strategies — training a small model from scratch, doing feature extraction using a pretrained model, and fine-tuning a pretrained model — will constitute your future toolbox for tackling the problem of performing image classification with small datasets.

The Relevance of Deep Learning for Small-Data Problems

You’ll sometimes hear that deep learning only works when lots of data is available. This is valid in part: one fundamental characteristic of deep learning is that it can find interesting features in the training data on its own, without any need for manual feature engineering, and this can only be achieved when lots of training examples are available. This is especially true for problems where the input samples are very high-dimensional, like images.

But what constitutes lots of samples is relative — relative to the size and depth of the network you’re trying to train, for starters. It isn’t possible to train a CNN to solve a complex problem with just a few tens of samples, but a few hundred can potentially suffice if the model is small and well regularized and the task is simple. Because CNNs learn local, translation-invariant features, they’re highly data efficient on perceptual problems. Training a CNN from scratch on a very small image dataset will still yield reasonable results despite a relative lack of data, without the need for any custom feature engineering. You’ll see this in action in this section.

What’s more, deep-learning models are by nature highly repurposable: you can take, say, an image-classification or speech-to-text model trained on a large-scale dataset and reuse it on a significantly different problem with only minor changes. Specifically, in the case of computer vision, many pretrained models (usually trained on the Image-Net dataset) are now publicly available for download and can be used to bootstrap powerful vision models out of very little data.

Downloading the Data

The Dogs vs. Cats dataset that you’ll use isn’t packaged with Keras. It was made available by Kaggle as part of a computer-vision competition in late 2013, back when CNNs weren’t mainstream. You can download the original dataset from:

You’ll need to create a Kaggle account if you don’t already have one.

In you Downloads folder make a new directory called kaggle_data and extract all the images into there — removing the subdirectories they are stored in from kaggle.

Unsurprisingly, the dogs-versus-cats Kaggle competition in 2013 was won by entrants who used CNNs. The best entries achieved up to 95% accuracy. In this example, you’ll get fairly close to this accuracy (in the next section), even though you’ll train your models on less than 10% of the data that was available to the competitors.

This dataset contains 25,000 images of dogs and cats (12,500 from each class) and is 812 MB (compressed). After downloading and uncompressing it, you’ll create a new dataset containing three subsets: a training set with 1,000 samples of each class, a validation set with 500 samples of each class, and a test set with 500 samples of each class.

Following is the code to do this.

import os, shutil
original_dataset_dir = '/home/jon/Downloads/kaggle_data/'
base_dir = '/home/jon/Downloads/cats_and_dogs_small'os.makedirs(base_dir)
train_dir = os.path.join(base_dir, 'train')os.makedirs(train_dir)
validation_dir = os.path.join(base_dir, 'validation')os.makedirs(validation_dir)
test_dir = os.path.join(base_dir, 'test')os.makedirs(test_dir)
train_cats_dir = os.path.join(train_dir, 'cats')os.makedirs(train_cats_dir)
train_dogs_dir = os.path.join(train_dir, 'dogs')os.makedirs(train_dogs_dir)
validation_cats_dir = os.path.join(validation_dir, 'cats')os.makedirs(validation_cats_dir)
validation_dogs_dir = os.path.join(validation_dir, 'dogs')os.makedirs(validation_dogs_dir)
test_cats_dir = os.path.join(test_dir, 'cats')os.makedirs(test_cats_dir)
test_dogs_dir = os.path.join(test_dir, 'dogs')os.makedirs(test_dogs_dir)
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]for fname in fnames:    src = os.path.join(original_dataset_dir, fname)    dst = os.path.join(train_cats_dir, fname)    shutil.copyfile(src, dst)
fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)]for fname in fnames:    src = os.path.join(original_dataset_dir, fname)    dst = os.path.join(validation_cats_dir, fname)    shutil.copyfile(src, dst)
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]for fname in fnames:    src = os.path.join(original_dataset_dir, fname)    dst = os.path.join(test_cats_dir, fname)    shutil.copyfile(src, dst)
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]for fname in fnames:    src = os.path.join(original_dataset_dir, fname)    dst = os.path.join(train_dogs_dir, fname)    shutil.copyfile(src, dst)
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)]for fname in fnames:    src = os.path.join(original_dataset_dir, fname)    dst = os.path.join(validation_dogs_dir, fname)    shutil.copyfile(src, dst)
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]for fname in fnames:    src = os.path.join(original_dataset_dir, fname)    dst = os.path.join(test_dogs_dir, fname)    shutil.copyfile(src, dst)
# Sanity Checksprint('total training cat images:', len(os.listdir(train_cats_dir)))print('total training dog images:', len(os.listdir(train_dogs_dir)))print('total validation cat images:', len(os.listdir(validation_cats_dir)))print('total validation dog images:', len(os.listdir(validation_dogs_dir)))print('total test cat images:', len(os.listdir(test_cats_dir)))print('total test dog images:', len(os.listdir(test_dogs_dir)))

So you do indeed have 2,000 training images, 1,000 validation images, and 1,000 test images. Each split contains the same number of samples from each class: this is a balanced binary-classification problem, which means classification accuracy will be an appropriate measure of success.

Building Your Network

You built a small CNN for MNIST in the previous example, so you should be familiar with such CNNs. You’ll reuse the same general structure: the CNN will be a stack of alternated Conv2D(with relu activation) and MaxPooling2D layers.

But because you’re dealing with bigger images and a more complex problem, you’ll make your network larger, accordingly: it will have one more Conv2D + MaxPooling2D stage. This serves both to augment the capacity of the network and to further reduce the size of the feature maps so they aren’t overly large when you reach the Flatten layer. Here, because you start from inputs of size 150 × 150 (a somewhat arbitrary choice), you end up with feature maps of size 7 × 7 just before the Flatten layer.