深度有趣 | 09 Inception-v3圖片分類

TensorFlow Python · 發表 2018-09-19 22:12:19

摘要： Inception-v3是由Google提出，用於實現ImageNet大規模視覺識別任務（ImageNet Large Visual Recognition Challenge）的一種神經網路 Inception-v3反覆使用了Inception Block，涉及大量的...

Inception-v3是由Google提出，用於實現ImageNet大規模視覺識別任務（ImageNet Large Visual Recognition Challenge）的一種神經網路

Inception-v3反覆使用了Inception Block，涉及大量的卷積和池化，而ImageNet包括1400多萬張圖片，類別數超過1000

因此手動在ImageNet上訓練Inception-v3，需要耗費大量的資源和時間

這裡我們選擇載入pre-trained的Inception-v3模型，來完成一些圖片分類任務

準備

預訓練好的模型共包括三個部分

classify_image_graph_def.pb
imagenet_2012_challenge_label_map_proto.pbtxt
imagenet_synset_to_human_label_map.txt

例如， 169 對應 n02510455 ，對應 giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca

圖片分類

載入庫

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np
複製程式碼

整理兩個對映檔案，得到從類別編號到類別名的對應關係

uid_to_human = {}
for line in tf.gfile.GFile('imagenet_synset_to_human_label_map.txt').readlines():
items = line.strip().split('\t')
uid_to_human[items[0]] = items[1]

node_id_to_uid = {}
for line in tf.gfile.GFile('imagenet_2012_challenge_label_map_proto.pbtxt').readlines():
if line.startswith('target_class:'):
target_class = int(line.split(': ')[1])
if line.startswith('target_class_string:'):
target_class_string = line.split(': ')[1].strip('\n').strip('\"')
node_id_to_uid[target_class] = target_class_string

node_id_to_name = {}
for key, value in node_id_to_uid.items():
node_id_to_name[key] = uid_to_human[value]
複製程式碼

載入模型

def create_graph():
with tf.gfile.FastGFile('classify_image_graph_def.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
複製程式碼

定義一個分類圖片的函式

def classify_image(image, top_k=1):
image_data = tf.gfile.FastGFile(image, 'rb').read()

create_graph()

with tf.Session() as sess:
# 'softmax:0': A tensor containing the normalized prediction across 1000 labels
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048 float description of the image
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG encoding of the image
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
predictions = sess.run(softmax_tensor, feed_dict={'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)

top_k = predictions.argsort()[-top_k:]
for node_id in top_k:
human_string = node_id_to_name[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
複製程式碼

呼叫函式進行圖片分類，指定引數 top_k 可以返回最可能的多種分類結果

classify_image('test1.png')
複製程式碼

分類結果如下

test1
test2
test3

定製分類任務

Inception-v3是針對ImageNet圖片分類任務設計的，因此最後一層全連線層的神經元個數和分類標籤個數相同

如果需要定製分類任務，只需要使用自己的標註資料，然後替換掉最後一層全連線層即可

最後一層全連線層的神經元個數等於定製分類任務的標籤個數，模型只訓練最後一層的引數，其他引數保持不變

保留了Inception-v3對於影象的理解和抽象能力，同時滿足定製的分類任務，屬於遷移學習的一種典型應用場景

TensorFlow官方提供瞭如何在Inception-v3上進行遷移學習的教程

ofollow,noindex">www.tensorflow.org/tutorials/i…

所使用的資料包括五種花的拍攝圖片

daisy：雛菊
dandelion：蒲公英
roses：玫瑰
sunflowers：向日葵
tulips：鬱金香

去掉最後一層全連線層後，對於一張圖片輸入，模型輸出的表示稱作Bottleneck

事先計算好全部圖片的Bottleneck並快取下來，可以節省很多訓練時間，因為後續只需計算和學習Bottleneck到輸出標籤之間的隱層即可

TensorFlow官方提供了重訓練的程式碼

github.com/tensorflow/…

在命令列中使用，一些可選的命令列引數包括

--image_dir ：訓練圖片目錄
--output_graph ：模型儲存目錄
--output_labels ：模型標籤儲存目錄
--summaries_dir ：模型日誌儲存目錄
--how_many_training_steps ：訓練迭代次數，預設為4000
--learning_rate ：學習率，預設為0.01
--testing_percentage ：測試集比例，預設為10%
--validation_percentage ：校驗集比例，預設為10%
--eval_step_interval ：模型評估頻率，預設10次迭代評估一次
--train_batch_size ：訓練批大小，預設為100
--print_misclassified_test_images ：是否輸出所有錯誤分類的測試集圖片，預設為False
--model_dir ：Inception-v3模型路徑
--bottleneck_dir ：Bottleneck快取目錄
--final_tensor_name ：新增的最後一層全連線層的名稱，預設為 final_result
--flip_left_right ：是否隨機將一半的圖片水平翻轉，預設為False
--random_crop ：隨機裁剪的比例，預設為0即不裁剪
--random_scale ：隨機放大的比例，預設為0即不放大
--random_brightness ：隨機增亮的比例，預設為0即不增亮
--architecture ：遷移的模型，預設為 inception_v3 ，準確率最高但訓練時間較長，還可以選 'mobilenet_<parameter size>_<input_size>[_quantized]' ，例如 mobilenet_1.0_224 和 mobilenet_0.25_128_quantized

跑一下程式碼

python retrain.py --image_dir flower_photos --output_graph output_graph.pb --output_labels output_labels.txt --summaries_dir summaries_dir --model_dir .. --bottleneck_dir bottleneck_dir
複製程式碼

此處對於視訊中內容的勘誤

將 --output_graph 之後的 output_graph 改為 output_graph.pb
將 --output_labels 之後的 output_labels 改為 output_labels.txt

在校驗集、測試集上的分類準確率分別為91%、91.2%

在我的筆記本上一共花了55分鐘，其中44分鐘花在了Bottleneck快取上，但如果不快取的話，訓練過程中每次迭代都必須重複計算一遍

summaries_dir 目錄下的訓練日誌可用於TensorBorad視覺化

tensorboard --logdir summaries_dir
複製程式碼

然後在瀏覽器中訪問 http://localhost:6006 ，即可看到視覺化的效果，包括 SCALARS 、 GRAPHS 、 DISTRIBUTIONS 、 HISTOGRAMS 四個頁面

如果需要完成其他圖片分類任務，整理相應的標註圖片，並以標籤名作為子資料夾名稱即可

如果要使用訓練好的模型，參照以下程式碼即可

output_labels.txt ：分類類別檔案路徑
output_graph.pb ：訓練好的模型路徑
read_image() ：讀取圖片的函式
input_operation ：圖片輸入對應的 operation
output_operation ：分類輸出對應的 operation
test.jpg ：待分類的圖片路徑

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np

labels = []
for line in tf.gfile.GFile('output_labels.txt').readlines():
labels.append(line.strip())

def create_graph():
graph = tf.Graph()
graph_def = tf.GraphDef()
with open('output_graph.pb', 'rb') as f:
graph_def.ParseFromString(f.read())
with graph.as_default():
tf.import_graph_def(graph_def)
return graph

def read_image(path, height=299, width=299, mean=128, std=128):
file_reader = tf.read_file(path, 'file_reader')
if path.endswith('.png'):
image_reader = tf.image.decode_png(file_reader, channels=3, name='png_reader')
elif path.endswith('.gif'):
image_reader = tf.squeeze(tf.image.decode_gif(file_reader, name='gif_reader'))
elif path.endswith('.bmp'):
image_reader = tf.image.decode_bmp(file_reader, name='bmp_reader')
else:
image_reader = tf.image.decode_jpeg(file_reader, channels=3, name='jpeg_reader')
image_np = tf.cast(image_reader, tf.float32)
image_np = tf.expand_dims(image_np, 0)
image_np = tf.image.resize_bilinear(image_np, [height, width])
image_np = tf.divide(tf.subtract(image_np, [mean]), [std])
sess = tf.Session()
image_data = sess.run(image_np)
return image_data

def classify_image(image, top_k=1):
image_data = read_image(image)

graph = create_graph()

with tf.Session(graph=graph) as sess:
input_operation = sess.graph.get_operation_by_name('import/Mul')
output_operation = sess.graph.get_operation_by_name('import/final_result')
predictions = sess.run(output_operation.outputs[0], feed_dict={input_operation.outputs[0]: image_data})
predictions = np.squeeze(predictions)

top_k = predictions.argsort()[-top_k:]
for i in top_k:
print('%s (score = %.5f)' % (labels[i], predictions[i]))

classify_image('test.jpg')
複製程式碼

深度有趣 | 09 Inception-v3圖片分類

準備

圖片分類

定製分類任務

您可能也會喜歡…