1. 程式人生 > >TensorFlow實戰:TensorFlow中的CNN

TensorFlow實戰:TensorFlow中的CNN

這裡按照官方api介紹官方api點這裡

卷積

不同的ops下使用的卷積操作總結如下:

  • conv2d:Arbitrary filters that can mix channels together(通道混合處理的任意濾波器)
  • depthwise_conv2d:Filters that operate on each channel independently.(單獨處理每個通道的濾波器)
  • separable_conv2d:A depthwise spatial filter followed by a pointwise filter.(一個深度方向的濾波器後跟著一個點濾波器)

strides引數

在TensorFlow中,無論是卷積操作 tf.nn.con2d或者是後面的池化操作tf.nn.max_pool都涉及到了引數strides的指定,這是因為這兩種操作都需要在輸入影象上滑動濾波器,我們需要指定在每一個維度滑動濾波器的步長.

TensorFlow的文件中給出strides的描述: 
A list of ints.一個長度為4的1-D tensor.指定了在輸入的Tensor中每個維度滑動濾波的步長。一般要求,strides[0] = strides[3] = 1 
為什麼要這麼要求? 
這是因為對於輸入而言,無論資料型別是”NHWC”或者”NCHW”,輸入的Tensor都包含了[batch,width,height,channels],對應的是strides剛好是這個4個維度的步長

  • 其中strides[0] = 1 代表從batch(即樣本)一個一個遍歷(如果設定為2,代表間隔性的遍歷樣本,與其這樣,我倒不如縮減樣本)
  • strides[3] = 1代表在channels上是一個一個通道滑動的(一般我們也是這麼做的)

padding引數

根據選擇padding的選擇為“SAME”或“VALID”的填充方案,計算輸出大小和填充畫素。

  1. 如果為”SAME”,則輸出尺寸計算如下:
out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))

在上方和左方填充後計算如下:
pad_along_height = max((out_height - 1) * strides[1] +
            filter_height - in_height, 0)
pad_along_width = max((out_width - 1) * strides[2] +
                   filter_width - in_width, 0)
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left

注意,除以2意味著可能會出現兩側(頂部與底部,右側和左側)有多一個填充的情況。 在這種情況下,底部和右側總是得到一個額外的填充畫素。 例如,當pad_along_height為5時,我們在頂部填充2個畫素,在底部填充3個畫素。 請注意,這不同於現有的庫,如cuDNN和Caffe,它們明確指定了填充畫素的數量,並且始終在兩側都填充相同數量的畫素。

    2. 如果為”VALID”,則輸出尺寸計算如下:

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

填充值為0,計算輸出為:
output[b, i, j, :] =
sum_{di, dj} input[b, strides[1] * i + di - pad_top,
               strides[2] * j + dj - pad_left, ...] *
         filter[di, dj, ...]

卷積下常用的方法有:

方法(加粗的有詳解) description
tf.nn.convolution Computes sums of N-D convolutions
tf.nn.conv2d Computes a 2-D convolution given 4-D input and filter tensors
tf.nn.depthwise_conv2d Depthwise 2-D convolution
tf.nn.depthwise_conv2d_native Computes a 2-D depthwise convolution given 4-D input and filter tensors.
tf.nn.separable_conv2d Computes a 2-D depthwise convolution given 4-D input and filter tensors
tf.nn.atrous_conv2d 2-D convolution with separable filters
tf.nn.atrous_conv2d_transpose The transpose of atrous_conv2d
tf.nn.conv2d_transpose The transpose of conv2d(卷積的逆向過程)
tf.nn.conv1d Computes a 1-D convolution given 3-D input and filter tensors
tf.nn.conv3d Computes a 3-D convolution given 5-D input and filter tensors
tf.nn.conv3d_transpose The transpose of conv3d
tf.nn.conv2d_backprop_filter Computes the gradients of convolution with respect to the filter.
tf.nn.conv2d_backprop_input Computes the gradients of convolution with respect to the input
tf.nn.conv3d_backprop_filter_v2 Computes the gradients of 3-D convolution with respect to the filter.
tf.nn.depthwise_conv2d_native_backprop_filter Computes the gradients of depthwise convolution with respect to the filter.
tf.nn.depthwise_conv2d_native_backprop_input Computes the gradients of depthwise convolution with respect to the input.

tf.nn.conv2d

conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=None,
    data_format=None,
    name=None
)

給定輸入Tensor的shape為[batch, in_height, in_width, in_channels],濾波器Tensor的shape為 [filter_height, filter_width, in_channels, out_channels], 計算過程如下:

  1. 將濾波器平坦化為2-D矩陣,形狀為[filter_height * filter_width * in_channels,output_channels]
  2. 從輸入張量中提取影象塊,形成一個形狀為[batch,out_height,out_width,filter_height * filter_width * in_channels]的虛擬張量。
  3. 對於每一個影象塊,做影象塊向量右乘濾波器矩陣。

#預設格式為NHWC:

output[b, i, j, k] =
sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
                filter[di, dj, q, k]

必須保持 strides[0] = strides[3] = 1.大多數情況下水平方向和深度方向的步長是一致的:即strides = [1, stride, stride, 1]
引數 description
input A Tensor.型別必須為如下其一:half, float32 .是一個4-D的Tensor,格式取決於data_format引數
filter A Tensor.和input型別一致,是一個4-D的Tensor且shape為
[filter_height, filter_width, in_channels, out_channels]
strides A list of ints.一個長度為4的1-D tensor.指定了在輸入的Tensor中每個維度滑動濾波的步長
padding 選擇不同的填充方案,可選”SAME”或”VALID”
use_cudnn_on_gpu An optional bool. Defaults to True
data_format 指定輸入和輸出的資料格式(可選).可選擇”NHWC”或者”NCHW”.預設為”NHWC”
”NHWC”:資料按[batch, height, width, channels]儲存
”NCHW”: 資料按[batch, channels, height, width]儲存
name 該ops的name(可選)
返回值 A Tensor. 型別和input相同,一個4-D的Tensor,格式取決於data_format

tf.nn.conv3d

conv3d(
    input,
    filter,
    strides,
    padding,
    data_format=None,
    name=None
)

在訊號處理中,互相關是兩個波形的相似度的度量,作為應用於其中之一的時滯的函式。也常被稱為 sliding dot product 或sliding inner-product.

我們的Conv3D類似於互相關的處理形式。

引數 description
input A Tensor.型別必須為如下其一:float32,float64,shape為
[batch, in_depth, in_height, in_width, in_channels]
filter A Tensor.和input型別一致.Shape為 
[filter_depth, filter_height, filter_width, in_channels, out_channels]
in_channels要在input和filter之間匹配
strides A list of ints.一個長度大於等於5的1-D tensor.
其中要滿足strides[0] = strides[4] = 1
padding 選擇不同的填充方案,可選”SAME”或”VALID”
data_format 指定輸入和輸出的資料格式(可選).可選擇”NDHWC”或”NCDHW”.預設為”NDHWC”
”NDHWC”:資料按 [batch, in_depth, in_height, in_width, in_channels]儲存
”NCDHW”: 資料按 [batch, in_channels, in_depth, in_height, in_width]儲存
name 該ops的name(可選)
返回值 A Tensor. 型別和input相同

池化

Each pooling op uses rectangular windows of size ksize separated by offset strides. For example, if strides is all ones every window is used, if strides is all twos every other window is used in each dimension.

In detail, the output is

output[i] = reduce(value[strides * i:strides * i + ksize])

池化常用的方法有:

方法(加粗的有詳解) description
tf.nn.avg_pool Performs the average pooling on the input
tf.nn.max_pool Performs the max pooling on the input
tf.nn.max_pool_with_argmax Performs max pooling on the input and outputs both max values and indices
tf.nn.avg_pool3d Performs 3D average pooling on the input
tf.nn.max_pool3d Performs 3D max pooling on the input
tf.nn.fractional_avg_pool Performs fractional average pooling on the input
tf.nn.fractional_max_pool Performs fractional max pooling on the input
tf.nn.pool Performs an N-D pooling operation.

tf.nn.max_pool

max_pool(
    value,
    ksize,
    strides,
    padding,
    data_format='NHWC',
    name=None
)
引數 description
value A 4-D Tensor 型別為 tf.float32.
shape為 [batch, height, width, channels]
ksize A list of ints that has length >= 4. The size of the window for each dimension of the input tensor.
strides A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.
padding ‘VALID’ 或 ‘SAME’
data_format ‘NHWC’ 或 ‘NCHW’
name 該ops的name(可選)
返回值 A Tensor with type tf.float32

圖形學濾波器

類似於影象處理中的圖形學處理(腐蝕、膨脹等)

圖形學常用的方法有:

方法(加粗的有詳解) description
tf.nn.dilation2d(膨脹) Computes the grayscale dilation of 4-D input and 3-D filter tensors
tf.nn.erosion2d(腐蝕) Computes the grayscale erosion of 4-D value and 3-D kernel tensors
tf.nn.with_space_to_batch Performs op on the space-to-batch representation of input

啟用函式

方法(加粗的有詳解) description
tf.nn.relu Computes rectified linear: max(features, 0)
tf.nn.relu6 Computes Rectified Linear 6: min(max(features, 0), 6)
tf.nn.crelu Computes Concatenated ReLU
tf.nn.elu Computes exponential linear: exp(features) - 1 if < 0, features otherwise.
tf.nn.softplus Computes softplus: log(exp(features) + 1).
tf.nn.softsign Computes softsign: features / (abs(features) + 1)
tf.nn.dropout Computes dropout
tf.nn.bias_add Adds bias to value
tf.sigmoid Computes sigmoid of x element-wise
tf.tanh Computes hyperbolic tangent of x element-wise

tf.nn.relu

relu(
features,
name=None
)

計算非線性對映:函式關係為 max(features, 0)

引數 description
features A Tensor. 型別是如下一個: float32, float64, int32, int64, uint8, int16, int8, uint16, half
name 該ops的name(可選)
返回值 A Tensor with sam type as features.

tf.nn.dropout

dropout(
    x,
    keep_prob,
    noise_shape=None,
    seed=None,
    name=None
)

使用概率keep_prob,將輸入元素按比例放大1 / keep_prob,否則輸出0.縮放是為了使預期的和不變

By default, each element is kept or dropped independently. If noise_shape is specified, it must be broadcastable to the shape of x, and only dimensions with noise_shape[i] == shape(x)[i] will make independent decisions. For example, if shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n], each batch and channel component will be kept independently and each row and column will be kept or not kept together.
引數 description
x A tensor.
keep_prob A scalar Tensor with the same type as x. The probability that each element is kept
noise_shape A 1-D Tensor of type int32, representing the shape for randomly generated keep/drop flags
seed A Python integer. Used to create random seeds
name 該ops的name(可選)
返回值 A Tensor of the same shape of x.

tf.nn.bias_add

bias_add(
    value,
    bias,
    data_format=None,
    name=None
)
引數 description
value A Tensor with type float, double, int64, int32, uint8, int16, int8, complex64, or complex128.
bias A 1-D Tensor with size matching the last dimension of value. Must be the same type as value unless value is a quantized type, in which case a different quantized type may be used.
data_format ‘NHWC’ or ‘NCHW’
name 該ops的name(可選)
返回值 A Tensor with the same type as value

正則化

方法 description
tf.nn.l2_normalize Normalizes along dimension dim using an L2 norm.
tf.nn.local_response_normalization Local Response Normalization.
tf.nn.sufficient_statistics Calculate the sufficient statistics for the mean and variance of x.
tf.nn.normalize_moments Calculate the mean and variance of based on the sufficient statistics.
tf.nn.moments Calculate the mean and variance of x.
tf.nn.weighted_moments Returns the frequency-weighted mean and variance of x
tf.nn.fused_batch_norm Batch normalization.
tf.nn.batch_normalization Batch normalization.
tf.nn.batch_norm_with_global_normalization Batch normalization.

損失函式

方法(加粗的有詳解) description
tf.nn.l2_loss Computes half the L2 norm of a tensor without the sqrt: 
output = sum(t ** 2) / 2
tf.nn.log_poisson_loss Computes log Poisson loss given log_input

分類器

方法(加粗的有詳解) description
tf.nn.sigmoid_cross_entropy_with_logits Computes sigmoid cross entropy given logits
tf.nn.softmax Computes softmax activations
tf.nn.log_softmax Computes log softmax activations:
logsoftmax = logits - log(reduce_sum(exp(logits), dim))
tf.nn.softmax_cross_entropy_with_logits Computes softmax cross entropy between logits and labels
tf.nn.sparse_softmax_cross_entropy_with_logits Computes sparse softmax cross entropy between logits and labels
tf.nn.weighted_cross_entropy_with_logits Computes a weighted cross entropy

api看完了,該動手擼程式碼了




TensorFlow實現簡易神經網路模型

MNIST資料集下的基礎CNN

本節使用的依舊是MNIST資料集,預期可到達到99.1%的準確率。 
使用的是結構為卷積–>池化–>卷積–>池化–>全連線層–>softmax層的卷積神經網路。 
下面直接貼程式碼:

# coding:utf8

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf


def weight_variable(shape):
	'''
	使用卷積神經網路會有很多權重和偏置需要建立,我們可以定義初始化函式便於重複使用
	這裡我們給權重製造一些隨機噪聲避免完全對稱,使用截斷的正態分佈噪聲,標準差為0.1
	:param shape: 需要建立的權重Shape
	:return: 權重Tensor
	'''
	initial = tf.truncated_normal(shape, stddev=0.1)
	return tf.Variable(initial)


def bias_variable(shape):
	'''
	偏置生成函式,因為啟用函式使用的是ReLU,我們給偏置增加一些小的正值(0.1)避免死亡節點(dead neurons)
	:param shape:
	:return:
	'''
	initial = tf.constant(0.1, shape=shape)
	return tf.Variable(initial)


def conv2d(x, W):
	'''
	卷積層接下來要重複使用,tf.nn.conv2d是Tensorflow中的二維卷積函式,
	:param x: 輸入 例如[5, 5, 1, 32]代表 卷積核尺寸為5x5,1個通道,32個不同卷積核
	:param W: 卷積的引數
		strides:代表卷積模板移動的步長,都是1代表不遺漏的劃過圖片的每一個點.
		padding:代表邊界處理方式,SAME代表輸入輸出同尺寸
	:return:
	'''
	return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")


def max_pool_2x2(x):
	'''
	tf.nn.max_pool是TensorFLow中最大池化函式.我們使用2x2最大池化
	因為希望整體上縮小圖片尺寸,因而池化層的strides設為橫豎兩個方向為2步長
	:param x:
	:return:
	'''
	return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")


def train(mnist):
	# 使用佔位符
	x = tf.placeholder(tf.float32, [None, 784])     # x為特徵
	y_ = tf.placeholder(tf.float32, [None, 10])     # y_為label

	# 卷積中將1x784轉換為28x28x1  [-1,,,]代表樣本數量不變 [,,,1]代表通道數
	x_image = tf.reshape(x, [-1, 28, 28, 1])

	# 第一個卷積層  [5, 5, 1, 32]代表 卷積核尺寸為5x5,1個通道,32個不同卷積核
	# 建立濾波器權值-->加偏置-->卷積-->池化
	W_conv1 = weight_variable([5, 5, 1, 32])
	b_conv1 = bias_variable([32])
	h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1)+b_conv1) #28x28x1 與32個5x5x1濾波器 --> 28x28x32
	h_pool1 = max_pool_2x2(h_conv1)  # 28x28x32 -->14x14x32

	# 第二層卷積層 卷積核依舊是5x5 通道為32   有64個不同的卷積核
	W_conv2 = weight_variable([5, 5, 32, 64])
	b_conv2 = bias_variable([64])
	h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) #14x14x32 與64個5x5x32濾波器 --> 14x14x64
	h_pool2 = max_pool_2x2(h_conv2)  #14x14x64 --> 7x7x64

	# h_pool2的大小為7x7x64 轉為1-D 然後做FC層
	W_fc1 = weight_variable([7*7*64, 1024])
	b_fc1 = bias_variable([1024])
	h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])    #7x7x64 --> 1x3136
	h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)  #FC層傳播 3136 --> 1024

	# 使用Dropout層減輕過擬合,通過一個placeholder傳入keep_prob比率控制
	# 在訓練中,我們隨機丟棄一部分節點的資料來減輕過擬合,預測時則保留全部資料追求最佳效能
	keep_prob = tf.placeholder(tf.float32)
	h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

	# 將Dropout層的輸出連線到一個Softmax層,得到最後的概率輸出
	W_fc2 = weight_variable([1024, 10])  #MNIST只有10種輸出可能
	b_fc2 = bias_variable([10])
	y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

	# 定義損失函式,依舊使用交叉熵  同時定義優化器  learning rate = 1e-4
	cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y_conv),reduction_indices=[1]))
	train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

	# 定義評測準確率
	correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
	accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

	#開始訓練
	with tf.Session() as sess:
		init_op = tf.global_variables_initializer() #初始化所有變數
		sess.run(init_op)

		STEPS = 20000
		for i in range(STEPS):
			batch = mnist.train.next_batch(50)
			if i % 100 == 0:
				train_accuracy = sess.run(accuracy, feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
				print('step %d,training accuracy %g' % (i, train_accuracy))
			sess.run(train_step, feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

		acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})
		print("test accuracy %g" % acc)



if __name__=="__main__":
	mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) # 載入資料集
	train(mnist)

我配置GPU版,訓練時間差不多3-4分鐘,從4000次開始在訓練集上的準確率已經接近為1了,最終在測試集上的準確率差不多為99.3%. 
輸出:

        Extracting MNIST_data/train-images-idx3-ubyte.gz
    Extracting MNIST_data/train-labels-idx1-ubyte.gz
    Extracting MNIST_data/t10k-images-idx3-ubyte.gz
    Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

    I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
    name: GeForce GTX 1080
    major: 6 minor: 1 memoryClockRate (GHz) 1.873
    pciBusID 0000:01:00.0
    Total memory: 7.92GiB
    Free memory: 6.96GiB
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
    step 0,training accuracy 0.06
    step 100,training accuracy 0.84
    step 200,training accuracy 0.92
    step 300,training accuracy 0.88
    step 400,training accuracy 0.98
    step 500,training accuracy 0.96
    step 600,training accuracy 1
    step 700,training accuracy 0.98
    step 800,training accuracy 0.88
    step 900,training accuracy 1
    step 1000,training accuracy 0.94
    step 1100,training accuracy 0.9
    step 1200,training accuracy 1
    step 1300,training accuracy 0.98
    step 1400,training accuracy 0.98
    ....
    step 4200,training accuracy 1
    step 4300,training accuracy 1
    step 4400,training accuracy 1
    step 4500,training accuracy 1
    step 4600,training accuracy 1
    step 4700,training accuracy 0.98
    step 4800,training accuracy 0.96
    step 6700,training accuracy 1
    step 6800,training accuracy 1
    step 6900,training accuracy 1
    ....
    step 19100,training accuracy 1
    step 19200,training accuracy 1
    step 19300,training accuracy 1
    step 19400,training accuracy 1
    step 19500,training accuracy 1
    step 19600,training accuracy 1
    step 19700,training accuracy 1
    step 19800,training accuracy 1
    step 19900,training accuracy 1
    test accuracy 0.993

可以看到CNN模型的準確率遠高於深度神經網路的準確率,這主要歸功於CNN的網路設計,CNN對影象特徵的提取和抽象能力。依靠這卷積核權值共享,CNN的引數量沒有爆炸,訓練速度得到保證的同時也減輕了過擬合,整個模型的效能大大提升

下面是我們使用更為複雜的CNN模型,同時改用CIFAR-10資料集進行測試訓練.


CIFAR-10資料集下的進階CNN

CIFAR-10資料集介紹

這裡寫圖片描述 
這裡寫圖片描述

這裡寫圖片描述

模型修正

  • 對weights進行了L2的正則化
  • 對輸入圖片進行翻轉、隨機剪下等資料增強,製造更多的樣本
  • 在每個卷積-池化層後使用LRN(區域性響應歸一化)層,增強模型的泛化能力

本次模型的網路結構如下:

Layer name description shape變化
conv1 卷積並ReLU啟用 輸入:24x24x3
(filter:64個’SAME’的5x5x3)
輸出:24x24x64
pool1 最大池化 輸入:24x24x64
(pool_filter:橫豎步長為2的3x3x1)
輸出:12x12x64
norm1 LRN 輸入:12x12x64
輸出:12x12x64
conv2 卷積並ReLU啟用 輸入:12x12x64
(filter:64個’SAME’的5x5x64)
輸出:12x12x64
norm2 LRN 輸入:12x12x64
輸出:12x12x64
pool2 最大池化 輸入:12x12x64
(pool_filter:橫豎步長為2的3x3x1)
輸出:6x6x64
local3 FC和ReLU啟用 輸入:6x6x64
(reshape為2304,全連線到下層)
輸出:384
local4 FC和ReLU啟用 輸入:384
(全連線)
輸出:192
logits 計算輸出 輸入:192
(全連線)
輸出:10

工程實現前的準備

本次我們使用的是cifar10資料集,通常我們會使用tensorflow下封裝好的類模組,便於我們使用.

#從github上下載tensorflow開源專案的測試程式碼
#我們使用的是cifar10,進入該目錄建立工程(該目錄下可以匯入cifar10和cifar10-input)
git clone https://github.com/tensorflow/models
cd models/tutorials/image/cifar10

在程式碼中我們會呼叫下面函式下載資料集

cifar10.maybe_download_and_extract()  #下載資料集並解壓

如果你的電腦線上速度不是很好,可以先載入資料集點這裡cifar10網站下載資料包

這裡寫圖片描述

下載完成後,解壓到/tmp/cifar10_data目錄下,便於後續的資料引用

data_dir = '/tmp/cifar10_data/cifar-10-batches-bin' #程式碼中引用資料的路徑

準備工作完成後,該開始擼程式碼了

程式碼實現

都是老套路了,就直接看程式碼了

    # coding:utf8

    from __future__ import division
    from models.tutorials.image.cifar10 import cifar10, cifar10_input

    import tensorflow as tf
    import numpy as np
    import time


    def variable_with_weight_loss(shape, stddev, wl):
        '''
        使用tf.truncated_normal截斷的正態分佈來初始化權重,這裡給weight加一個L2的loss.
        我們使用wl控制L2 loss的大小,使用tf.nn.l2_loss計算weight的L2 loss.
        再使用tf.multiply讓L2 loss乘wl,得到最後的weight loss,最後將weight loss新增到一個collection.便於後期優化
        :param shape:
        :param stddev:
        :param wl:
        :return:
        '''
        var = tf.Variable(tf.truncated_normal(shape, stddev=stddev))
        if wl is not None:
            weight_loss = tf.multiply(tf.nn.l2_loss(var), wl, name='weight_loss')
            tf.add_to_collection('losses',weight_loss)
        return var


    def loss(logits, labels):
        '''
        使用tf.nn.sparse_softmax_cross_entropy_with_logits將softmax和cross_entropy_loss計算合在一起
        並計算cross_entropy的均值新增到losses集合.以便於後面輸出所有losses
        :param logits:
        :param labels:
        :return:
        '''
        labels = tf.cast(labels, tf.int64)
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                            labels=labels, name='cross_entropy_per_example')
        cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
        tf.add_to_collection('losses',cross_entropy_mean)

        return tf.add_n(tf.get_collection('losses'), name='total_loss')


    def train():
        max_steps = 30000
        batch_size = 128
        data_dir = '/tmp/cifar10_data/cifar-10-batches-bin'

        '''
        使用cifar10_input類中的disorted_inputs函式產生訓練資料,產生的資料是已經封裝好的Tensor,每次會產生batch_size個
        這個函式已經對圖片資料做了增強操作(隨機水平翻轉/剪下/隨機對比度等)
        同時因為對影象處理需要耗費大量計算資源,該函式使用了16個獨立的執行緒來加速任務,
        函式內部會產生執行緒池,使用會通過TensorFlow queue進行排程
        '''
        images_train,labels_train = cifar10_input.distorted_inputs(data_dir=data_dir, batch_size=batch_size)
        images_test,labels_test = cifar10_input.inputs(eval_data=True, data_dir=data_dir, batch_size=batch_size)

        image_holder = tf.placeholder(tf.float32, [batch_size, 24 ,24, 3])
        label_holder = tf.placeholder(tf.int32, [batch_size])

        # 第一層 卷積-->池化-->lrn
        # 不帶L2正則項(wl=0)的64個5x5x3的濾波器,
        # 使用lrn是從區域性多個卷積核的響應中挑選比較大的反饋變得相對最大,並抑制其他反饋小的,增加模型泛化能力
        weight1 = variable_with_weight_loss(shape=[5, 5, 3, 64], stddev=5e-2, wl=0.0)
        kernel1 = tf.nn.conv2d(image_holder, weight1, strides=[1, 1, 1, 1], padding='SAME')
        bias1 = tf.Variable(tf.constant(0.0, shape=[64])) #bias直接初始化為0
        conv1 = tf.nn.relu(tf.nn.bias_add(kernel1, bias1))
        pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
        norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001/9.0, beta=0.75)

        # 第二層  卷積-->lrn-->池化
        weight2 = variable_with_weight_loss(shape=[5, 5, 64, 64], stddev=5e-2, wl=0.0)
        kernel2 = tf.nn.conv2d(norm1, weight2, strides=[1, 1, 1, 1], padding='SAME')
        bias2 = tf.Variable(tf.constant(0.1,shape=[64]))
        conv2 = tf.nn.relu(tf.nn.bias_add(kernel2, bias2))
        norm2 = tf.nn.lrn(conv2, 4, bias=1.0,alpha=0.001/9.0,beta=0.75)
        pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],strides=[1, 2, 2, 1],padding='SAME')


        #第三層 使用全連線層 reshape後獲取長度並建立FC1層的權重(帶L2正則化)
        reshape = tf.reshape(pool2, [batch_size, -1])
        dim = reshape.get_shape()[1].value
        weight3 = variable_with_weight_loss(shape=[dim, 384], stddev=0.04, wl=0.004)
        bias3 = tf.Variable(tf.constant(0.1,shape=[384]))
        local3 = tf.nn.relu(tf.matmul(reshape, weight3) + bias3)

        #第四層 FC2層  節點數減半  依舊帶L2正則
        weight4 = variable_with_weight_loss(shape=[384, 192], stddev=0.04, wl=0.004)
        bias4 = tf.Variable(tf.constant(0.1, shape=[192]))
        local4 = tf.nn.relu(tf.matmul(local3,weight4) + bias4)

        #最後一層 這層的weight設為正態分佈標準差設為上一個FC層的節點數的倒數
        #這裡我們不計算softmax,把softmax放到後面計算
        weight5 = variable_with_weight_loss(shape=[192,10], stddev=1/192.0, wl=0.0)
        bias5 = tf.Variable(tf.constant(0.0, shape=[10]))
        logits = tf.add(tf.matmul(local4, weight5),bias5)

        #損失函式為兩個帶L2正則的FC層和最後的轉換層
        #優化器依舊是AdamOptimizer,學習率是1e-3
        losses = loss(logits, label_holder)
        train_op = tf.train.AdamOptimizer(1e-3).minimize(losses)

        #in_top_k函式求出輸出結果中top k的準確率,這裡選擇輸出top1
        top_k_op = tf.nn.in_top_k(logits, label_holder, 1)

        #建立預設session,初始化變數
        sess = tf.InteractiveSession()
        tf.global_variables_initializer().run()

        #啟動圖片增強執行緒佇列
        tf.train.start_queue_runners()

        #訓練
        for step in range(max_steps):
            start_time = time.time()
            image_batch,label_batch = sess.run([images_train, labels_train])
            _,loss_value = sess.run([train_op, losses], feed_dict={image_holder: image_batch,
                                                                label_holder: label_batch})
            duration = time.time() - start_time

            if step % 10 == 0:
                examples_per_sec = batch_size / duration
                sec_per_batch = float(duration)

                format_str = ('step %d,loss=%.2f (%.1f examples/sec; %.3f sec/batch)')
                print(format_str % (step, loss_value, examples_per_sec, sec_per_batch))

        #評估模型的準確率,測試集一共有10000個樣本
        #我們先計算大概要多少個batch能測試完所有樣本
        num_examples = 10000
        import math
        num_iter = int(math.ceil(num_examples / batch_size))
        true_count = 0
        total_sample_count = num_iter * batch_size  #除去不夠一個batch的
        step = 0
        while step < num_iter:
            image_batch, label_batch = sess.run([images_test, labels_test])
            predictions = sess.run([top_k_op], feed_dict={image_holder: image_batch,
                                                          label_holder: label_batch})
            true_count += np.sum(predictions)  #利用top_k_op計算輸出結果
            step += 1

        precision = true_count / total_sample_count
        print('precision @ 1=%.3f' % precision)  #這裡如果輸出為0.00 是因為整數/整數  記得要匯入float除法


    if __name__ == '__main__':
        cifar10.maybe_download_and_extract()
        train()

輸出: 
一共訓練了30000輪,最後準確率能達到79.4%

 

過增加max_steps可以增加模型的準確率,如果max_steps很大,可以

    step 0,loss=4.67 (6.6 examples/sec; 19.420 sec/batch)
    step 10,loss=3.64 (2547.1 examples/sec; 0.050 sec/batch)
    step 20,loss=3.27 (2463.0 examples/sec; 0.052 sec/batch)
    step 30,loss=2.75 (2624.5 examples/sec; 0.049 sec/batch)
    step 40,loss=2.43 (2582.4 examples/sec; 0.050 sec/batch)
    step 50,loss=2.31 (2464.1 examples/sec; 0.052 sec/batch)
    step 60,loss=2.20 (2585.5 examples/sec; 0.050 sec/batch)
    step 70,loss=1.98 (2622.6 examples/sec; 0.049 sec/batch)
    step 80,loss=2.05 (2560.3 examples/sec; 0.050 sec/batch)
    step 90,loss=2.06 (2482.5 examples/sec; 0.052 sec/batch)
    step 100,loss=1.96 (2544.2 examples/sec; 0.050 sec/batch)
    step 110,loss=1.83 (2432.1 examples/sec; 0.053 sec/batch)
    ....
    step 29920,loss=0.69 (2439.3 examples/sec; 0.052 sec/batch)
    step 29930,loss=0.71 (2563.3 examples/sec; 0.050 sec/batch)
    step 29940,loss=0.80 (2488.2 examples/sec; 0.051 sec/batch)
    step 29950,loss=0.83 (2564.5 examples/sec; 0.050 sec/batch)
    step 29960,loss=0.80 (2643.3 examples/sec; 0.048 sec/batch)
    step 29970,loss=0.75 (2488.7 examples/sec; 0.051 sec/batch)
    step 29980,loss=0.66 (2516.7 examples/sec; 0.051 sec/batch)
    step 29990,loss=0.77 (2464.1 examples/sec; 0.052 sec/batch)
    precision @ 1=0.794

使用帶衰減的learning rate.

總結

在訓練前,我們呼叫了cifar10_input.distorted_inputs實現了樣本的資料增強,它可以給單幅圖增加多個副本,提高圖片的利用率,防止對某一張圖片學習過擬合。這也恰恰是利用圖片本身的資訊,圖片的冗餘資訊量比較大,我們可以製造不同的噪聲並讓圖片依然可以很好的識別出來,這樣模型的泛化能力必然會增強。

從本節的例子來看,卷積層一般需要和一個池化層連線,這已經是影象處理裡面的標準組件了,在實現中,我們還有新增lrn/l2正則化/權值初始化等Trick.怎樣才能搭建一個好的模型,這需要在大量的實踐和學習中摸索