Tensorflow實現CNN用於MNIST識別

阿新 • • 發佈：2018-11-21

這篇文章一步步教如何使用tensorflow建一個CNN，並將其應用到MNIST手寫體識別中，重點是瞭解每一步在做什麼。相應的練習程式碼和jupyter notebook在我的github可找到。

1. 設定結構

這篇文章實現下面的一個簡單的結構：

(Input) -> [batch_size, 28, 28, 1] >> Apply 32 filter of [5x5]
(Convolutional layer 1) -> [batch_size, 28, 28, 32]
(ReLU 1) -> [?, 28, 28, 32]

(Max pooling 1) -> [?, 14, 14, 32]
(Convolutional layer 2) -> [?, 14, 14, 64]
(ReLU 2) -> [?, 14, 14, 64]
(Max pooling 2) -> [?, 7, 7, 64]
[fully connected layer 3] -> [1x1024]
[ReLU 3] -> [1x1024]
[Drop out] -> [1x1024]

2.建立一個互動式的會話

import tensorflow as tf
sess = tf.InteractiveSession()

3. 載入MNIST資料

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot = True)

4. 初始化引數

width = 28 # width of the image in pixels 
height = 28 # height of the image in pixels
flat = width * height # number of pixels in one image 
class_output = 10 # number of possible classifications for the problem

5. 建立輸入輸出佔位符

x = tf.placeholder(tf.float32, shape = [None, flat])
y_ = tf.placeholder(tf.float32, shape = [None, class_output])

6. 將影象轉為tensor

輸入影象28*28畫素，一個通道。第一個維度是輸入批數量的大小，可以為任意大小。第二個和第三個維度是寬和高，最後一個維度是影象的通道。

x_image = tf.reshape(x, [-1, 28, 28, 1])

7.卷積層1

定義kernal權重和偏置

這裡定義一個55的核，輸入通道為1；
每張圖片使用32個不同kernal；
卷積層的輸出為 2828*32.
kernal張量的shape為[filter_height, filter_width, in_channels, out_channels]

W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev = 0.1))
b_conv1 = tf.Variable(tf.ocnstant(0.1, shape=[32])) # 32個輸出需要32個偏置

對權重做卷積並加上偏置

使用 tf.nn.conv2d建立卷積層，它用於計算給定4維輸入和filter張量的2為卷積
輸入：
tensor of shape [batch, in_width, in_channels], x of shape [batch_size,28 ,28, 1]
filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]. W is of size [5, 5, 1, 32]
stride [1, 1, 1, 1].
處理：
將filter改變為[551, 32]的2維矩陣
從輸入張量中提取影象塊形成一個虛擬的張量shape [batch, 28, 28, 551]
對於每一batch，右乘filter矩陣和影象向量。
輸出：
一個shape為shape=(?, 28, 28, 32)的tensor，也就是32個【28*28】的影象，32是輸出影象的depth。

convolve1 = tf.nn.conv2d(x_image, w_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1

convolvel1

使用ReLU啟用函式

h_conv1 = tf.nn.relu(convolve1)

8. 最大池化

最大池化是一個非線性下采樣方法，它把輸入影象分成一系列的長方形，再找到每個長方形中的最大值。
使用tf.nn.max_pool函式做最大池化，Kernel size: 2x2
stride: 每次kernel滑動2個畫素，沒有overlapping。輸入矩陣的大小為[14x14x32]，輸出的大小為[14x14x32].
max——pooling

conv1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

輸出：
<tf.Tensor ‘MaxPool:0’ shape=(?, 14, 14, 32) dtype=float32>

9. 卷積層2

kernel的權重和偏置

第二層的kernel：

Filter/kernel: 5x5 (25 pixels)
Input channels: 32 (from the 1st Conv layer, we had 32 feature maps)
64 output feature maps
輸入影象大小[14x14x32], kenel大小 [5x5x32], 使用64個核，輸出為[14x4x64]

w_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))

影象與權重做卷積並加上偏置

convolve2 = tf.nn.conv2d(conv1, w_conv2, strides=[1, 1, 1, 1], padding="SAME") + b_conv2

Relu啟用

h_conv2 = tf.nn.relu(convolve2)

10. 最大池化

conv2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

輸出conv2為
<tf.Tensor ‘MaxPool_1:0’ shape=(?, 7, 7, 64) dtype=float32>

11. 全連線層

使用全連線層是為了使用softmax在最後得到概率輸出。
它把前面的層中抽取深層圖片，也就是最後輸出的64個矩陣，展平為一列。
每一個[7x7]的矩陣會轉為[49x1]的矩陣，將64個[49x1]的矩陣拼接起來得到[3136x1]的矩陣。
把它與[1024x1]的層連線起來，兩層間的weight大小為[3136x1024].
fully_con

展平上層的輸出

layer2_matrix = tf.reshape(conv2, [-1, 7 * 7 * 64])

第2, 3層的weight和bias

w_fcl = tf.Variable(tf.truncated_normal(shape=[3136, 1024], stddev=0.1))
b_fcl = tf.Variable(tf.constant(0.1, shape=[1024]))

矩陣相乘並加上偏置

fcl = tf.matmul(layer2_matrix, w_fcl) + b_fcl

使用Relu啟用

h_fcl = tf.nn.relu(fcl)

輸出h_fcl:<tf.Tensor ‘Relu_2:0’ shape=(?, 1024) dtype=float32>

12. dropout層

keep_prob = tf.palceholder(tf.float32)
layer_drop = tf.nn.dropout(h_fcl, keep_prob)

輸出layer_drop：<tf.Tensor ‘dropout/mul:0’ shape=(?, 1024) dtype=float32>

13. Softmax

weigh and bias

輸入為[1024x1], 輸出為[10x1], 兩層之間的weight為[1024x10].

W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev = 0.1)) # 1024 neurons
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10]) # 10 possibilities for digits [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

矩陣相乘

fc = tf.matmul(layer_drop, W_fc2) + b_fc2

softmax 啟用函式

y_CNN = tf.nn.softmax(fc)

輸出y_CNN:<tf.Tensor ‘Softmax:0’ shape=(?, 10) dtype=float32>

14. 定義損失函式和訓練模型

定義損失函式

使用交叉熵評價模型的錯誤率。
這裡就先舉兩個輸出和真實標記的交叉熵的例子。

import numpy as np
layer4_test = [[0.9, 0.1, 0.1], [0.9, 0.1,0.1]]
y_test = [[1.0 ,0.0, 0.0], [1.0, 0.0, 0.0]]
np.mean(-np.sum(y_test * np.log(layer4_test), 1))

使用reduce_sum 計算y_*tf.log(layer4)中各元素之和，reduce_mean計算tensor中個元素的均值。

cross_entropy = tf.reduce_mean(-tf.reduce(y_ * tf.log(y_CNN), reduction_indices=[1]))

定義optimizer

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

定義預測函式

correct_predition = tf.equal(tf.argmax(y_CNN), tf.argmax(y_, 1))

定義準確率

accuracy = tf.reduce_mean(tf.cast(correct_predicetion, tf.float32))

15. 執行會話、訓練

sess.run(tf.global_variabels_initializer())
for i in range(1100):
		batch = mnist.train.next_batch(50)
		if i % 100 == :
			tain_accracy = accuracy.eval(feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
			print('step %d, training accuracy %g' %(i, train_accuracy))
		train_step.run(feed_dict={x:batch[0], y_:batch[1], keep_prob:0.5})

執行結果：
step 0, training accuracy 0.16
step 100, training accuracy 0.86
step 200, training accuracy 0.88
step 300, training accuracy 0.92
step 400, training accuracy 0.94
step 500, training accuracy 0.94
step 600, training accuracy 0.98
step 700, training accuracy 0.96
step 800, training accuracy 0.9
step 900, training accuracy 0.96
step 1000, training accuracy 1

16. 評價模型

print('test accuracy %g' %accuracy.eval(feed_dict{x:mnist.test.images, y_：mnist.test.labels， keep_prob:1.0}))
準確率：
test accuracy 0.9656

視覺化

檢視所有filter

kernels = sess.run(tf.reshape(tf.transpose(W_conv1, perm=[2, 3, 0 ,1]), [32, -1]))
### get tools from remote sever
import urllib.request
response = urllib.request.urlopen('http://deeplearning.net/tutorial/code/utils.py')
content = response.read().decode('utf-8')
target = open('utils1.py', 'w')
target.write(content)
target.close()

from utils1 import tile_raster_images
import matplotlib.pyplot as plt 
from PIL import Image
# %matplotlib inline
image = Image.fromarray(tile_raster_images(kernels, img_shape=(5, 5) ,tile_shape=(4, 8), tile_spacing=(1, 1)))
### Plot image
plt.rcParams['figure.figsize'] = (18.0, 18.0)
imgplot = plt.imshow(image)
imgplot.set_cmap('gray')

kernel_image

第一層卷積層的輸出

import numpy as np
plt.rcParams['figure.figsize'] = (5.0, 5.0)
sampleimage = mnise.test.images[1]
plt.imshow(np.reshape(sampleimage, [28, 28]), cmap='gray')

output_convol

plt.rcParams['figure.figsize'] = (5.0, 5.0)
sampleimage = mnist.test.images[1]
plt.imshow(np.reshape(sampleimage, [28, 28]), cmap='gray')
ActivatedUnits = sess.run(convolve1, feed_dict={x:np.reshape(sampleimage, [1, 784], order='F'), keep_prob:1.0})
filters = ActivatedUnits.shape[3]
plt.figure(1, figsize=(20, 20))
n_columns = 6
n_rows = np.math.ceil(filters/n_columns) + 1
for i in range(filters):
    plt.subplot(n_rows, n_columns, i+1)
    plt.title('Filters' + str(i))
    plt.imshow(ActivatedUnits[0, :, :, i], interpolation = 'nearest', cmap='gray')

filters_output

第二個卷積層的輸出

ActivatedUnits = sess.run(convolve2,feed_dict={x:np.reshape(sampleimage, [1,784], order='F'), keep_prob:1.0})
filters = ActivatedUnits.shape[3]
plt.figure(1, figsize=(20,20))
n_columns = 8
n_rows = np.math.ceil(filters / n_columns) + 1
for i in range(filters):
    plt.subplot(n_rows, n_columns, i+1)
    plt.title('Filter ' + str(i))
    plt.imshow(ActivatedUnits[0, :, :, i], interpolation="nearest", cmap="gray")

convolvel2_output

結束會話

sess.close() # finish the session

References

https://en.wikipedia.org/wiki/Deep_learning
http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent
http://yann.lecun.com/exdb/mnist/
https://www.quora.com/Artificial-Neural-Networks-What-is-the-difference-between-activation-functions
https://www.tensorflow.org/versions/r0.9/tutorials/mnist/pros/index.html

本文譯自 Deep Learning with TensorFlow IBM Cognitive Class ML0120EN
ML0120EN-2.2-Review-CNN-MNIST-Dataset