TensorFlow學習筆記（11）--【Ubuntu】slim框架下的inception_v4模型的執行、視覺化、匯出和使用

阿新 • • 發佈：2019-01-06

模型：slim框架下的Inception_v4模型
Inception_v4的Checkpoint：http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
資料集：google的flower資料集http://download.tensorflow.org/example_images/flower_photos.tgz 5種類別的花

資料準備

資料集下下來之後按/home/lwp/data/flower/my_flower_5路徑放好，可以看到它是這個樣子的，每個類的花一個資料夾

這裡寫圖片描述

開啟一個我們可以看到裡面是各種圖片

這裡寫圖片描述

在模型目錄source/models/slim

下有一個指令碼檔案convert_tfrecord.sh
convert_tfrecord.sh檔案內容如下：

source env_set.sh
python download_and_convert_data.py \
  --dataset_name=$DATASET_NAME \
  --dataset_dir=$DATASET_DIR

可以看到通過env_set.sh傳遞變數
env_set.sh檔案內容如下：

export DATASET_NAME=my_flower_5
export DATASET_DIR=/home/lwp/data/flower
export CHECKPOINT_PATH=/home/lwp/pre_trained/inception_v4.ckpt
export TRAIN_DIR=/tmp/my_train_20170725

檔案定義了：

DATASET_NAME：資料集名稱
DATASET_DIR：資料集路徑
CHECKPOINT_PATH：預訓練的inception_v4模型路徑
TRAIN_DIR：訓練生成checkpoint儲存路徑

環境變數配置完後進入到模型目錄下

$ cd source/models/slim

執行指令碼：

$ ./convert_tfrecord.sh

完成後資料就準備好了
這裡寫圖片描述

預訓練模型準備

/home/lwp/pre_trained

這裡寫圖片描述

執行訓練指令碼

（在修改好模型相關引數的前提下，如訓練程式執行指令碼run_train.sh,測試程式執行指令碼run_eval.sh,環境變數env_set.sh

等）

$ ./run_train.sh

run_train.sh內容如下：

source env_set.sh

nohup python -u train_image_classifier.py \
  --dataset_name=$DATASET_NAME \
  --dataset_dir=$DATASET_DIR \
  --checkpoint_path=$CHECKPOINT_PATH \
  --model_name=inception_v4 \
  --checkpoint_exclude_scopes=InceptionV4/Logits,InceptionV4/AuxLogits/Aux_logits \
  --trainable_scopes=InceptionV4/Logits,InceptionV4/AuxLogits/Aux_logits \
  --train_dir=$TRAIN_DIR \
  --learning_rate=0.001 \
  --learning_rate_decay_factor=0.76\
  --num_epochs_per_decay=50 \
  --moving_average_decay=0.9999 \
  --optimizer=adam \
  --ignore_missing_vars=True \
  --batch_size=32 > output.log 2>&1 &

$ tail -f output.log # 當前日誌動態顯示
# 或者
$ cat output.log # 一次顯示整個log檔案

如下所示

INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
INFO:tensorflow:Fine-tuning from /home/lwp/pre_trained/inception_v4.ckpt
2017-07-27 08:32:08.547822: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 08:32:08.547847: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 08:32:08.547868: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 08:32:08.547887: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 08:32:08.547892: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 08:32:08.861766: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-27 08:32:08.862322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:01:00.0
Total memory: 10.91GiB
Free memory: 10.58GiB
2017-07-27 08:32:08.862342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-07-27 08:32:08.862350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-07-27 08:32:08.862359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
INFO:tensorflow:Restoring parameters from /home/lwp/pre_trained/inception_v4.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /tmp/my_train_20170725/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 1.
INFO:tensorflow:global step 10: loss = 2.9544 (0.277 sec/step)
INFO:tensorflow:global step 20: loss = 2.7159 (0.267 sec/step)
INFO:tensorflow:global step 30: loss = 3.0572 (0.261 sec/step)

在/tmp/my_train_20170725路徑下可以看到訓練生成的checkpoint：meta、data、index

這裡寫圖片描述

該路徑在環境變數設定指令碼env_set.sh中定義

執行測試指令碼

$ ./run_eval.sh

run_eval.sh的內容如下：

source env_set.sh
python -u eval_image_classifier.py \
  --dataset_name=$DATASET_NAME \
  --dataset_dir=$DATASET_DIR \
  --dataset_split_name=validation \
  --model_name=inception_v4 \
  --checkpoint_path=$TRAIN_DIR \
  --eval_dir=/tmp/eval/validation \
  --eval_interval_secs=60 \
  --batch_size=32

其中eval_interval_secs=60是指定兩次驗證的最小間隔時間為60s，具體定義在eval_image_classifier.py檔案中。

這裡訓練和驗證程式是分開的，模型在剛開始訓練的時候效果必然很差，並不需要去驗證，而且訓練過程持續時間很長，如果將訓練和驗證放在一起的話，無用的驗證就佔用的很多時間。
將訓練和驗證分開這樣就可以在其他機器上訪問checkpoint（路徑為/tmp/my_train_20170725）去做驗證，這樣就可以把資源分散開。

執行後如下：

.
.
.
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:01:00.0
Total memory: 10.91GiB
Free memory: 2.24GiB
2017-07-27 09:27:33.151287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-07-27 09:27:33.151292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-07-27 09:27:33.151299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
INFO:tensorflow:Restoring parameters from /tmp/my_train_20170725/model.ckpt-11028
INFO:tensorflow:Starting evaluation at 2017-07-27-01:27:47
2017-07-27 09:27:49.207742: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.51GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
INFO:tensorflow:Evaluation [1/12]
INFO:tensorflow:Evaluation [2/12]
INFO:tensorflow:Evaluation [3/12]
INFO:tensorflow:Evaluation [4/12]
INFO:tensorflow:Evaluation [5/12]
INFO:tensorflow:Evaluation [6/12]
INFO:tensorflow:Evaluation [7/12]
INFO:tensorflow:Evaluation [8/12]
INFO:tensorflow:Evaluation [9/12]
INFO:tensorflow:Evaluation [10/12]
INFO:tensorflow:Evaluation [11/12]
INFO:tensorflow:Evaluation [12/12]
INFO:tensorflow:Finished evaluation at 2017-07-27-01:27:56
2017-07-27 09:27:57.363998: I tensorflow/core/kernels/logging_ops.cc:79] eval/Recall_5[1]
2017-07-27 09:27:57.364187: I tensorflow/core/kernels/logging_ops.cc:79] eval/Accuracy[0.87760419]
INFO:tensorflow:Waiting for new checkpoint at /tmp/my_train_20170725

迴圈驗證
可以看到給出了驗證結果，注意最後一行Waiting for new checkpoint at /tmp/my_train_20170725，這是在eval_image_classifier.py中自定義了一個loop，去監聽/tmp/my_train_20170725，一旦有新的checkpoint生成，就去執行一次驗證。

視覺化訓練：TensorBoard

執行：

$ tensorboard --logdir /tmp/my_train_20170725

得到：

Starting TensorBoard 55 at http://lw:6006
(Press CTRL+C to quit)

檢視本機IP：

$ ifconfig -a

在瀏覽器中輸入地址：

http://192.168.0.102：6006

這裡寫圖片描述

如果出現TensorBoard但不顯示內容的情況，可以嘗試換一個瀏覽器，我用Fire fox就是不顯示，換chrome就好了。

結束訓練

檢視python程序
執行：

$ ps -ef |grep python

得到：

lwp       2780  2025 99 08:31 pts/0    03:38:22 python -u train_image_classifier.py --dataset_name=my_flower_5 --dataset_dir=/home/lwp/data/flower --checkpoint_path=/home/lwp/pre_trained/inception_v4.ckpt --model_name=inception_v4 --checkpoint_exclude_scopes=InceptionV4/Logits,InceptionV4/AuxLogits/Aux_logits --trainable_scopes=InceptionV4/Logits,InceptionV4/AuxLogits/Aux_logits --train_dir=/tmp/my_train_20170725 --learning_rate=0.001 --learning_rate_decay_factor=0.76 --num_epochs_per_decay=50 --moving_average_decay=0.9999 --optimizer=adam --ignore_missing_vars=True --batch_size=32
lwp      18830  3674  1 09:40 pts/2    00:00:15 /usr/bin/python /usr/local/bin/tensorboard --logdir /tmp/my_train_20170725
lwp      24837  2763  0 09:53 pts/0    00:00:00 grep --color=auto python

可以看到模型訓練的程序號為2780

殺掉程序，結束訓練

$ kill 2780

模型匯出和使用

模型匯出
執行指令碼：

$ ./export_freeze.sh

得到3個檔案：
這裡寫圖片描述
分別儲存的是模型的label、權重、結構

export_freeze.sh檔案內容如下：

source env_set.sh
python -u export_inference_graph.py \
  --model_name=inception_v4 \
  --output_file=./my_inception_v4.pb \
  --dataset_name=$DATASET_NAME \
  --dataset_dir=$DATASET_DIR


NEWEST_CHECKPOINT=$(ls -t1 $TRAIN_DIR/model.ckpt*| head -n1)
NEWEST_CHECKPOINT=${NEWEST_CHECKPOINT%.*}
python -u ~/tensorflow/tensorflow/python/tools/freeze_graph.py \
  --input_graph=my_inception_v4.pb \
  --input_checkpoint=$NEWEST_CHECKPOINT \
  --output_graph=./my_inception_v4_freeze.pb \
  --input_binary=True \
  --output_node_name=InceptionV4/Logits/Predictions

cp $DATASET_DIR/labels.txt ./my_inception_v4_freeze.label

模型使用
基於python的webserver
執行指令碼：

$ ./server.sh

得到：

listening on port 5001
2017-07-27 10:04:54.279779: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 10:04:54.279800: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 10:04:54.279806: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 10:04:54.279810: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 10:04:54.279814: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-27 10:04:54.411389: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-27 10:04:54.411804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:01:00.0
Total memory: 10.91GiB
Free memory: 10.50GiB
2017-07-27 10:04:54.411818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-07-27 10:04:54.411822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-07-27 10:04:54.411828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
 * Running on http://0.0.0.0:5001/ (Press CTRL+C to quit)

在瀏覽器輸入地址：

http://本機IP:5001

這裡寫圖片描述

選擇一張圖片並上傳，然後就會顯示識別結果
（注意，圖片所在路徑為/tmp/upload，在server.sh檔案中定義）

server.sh檔案內容如下：

python -u server.py \
  --model_name=my_inception_v4_freeze.pb \
  --label_file=my_inception_v4_freeze.label \
  --upload_folder=/tmp/upload

具體定義在server.py檔案中

這裡寫圖片描述

如圖得到5個分類的得分值，識別為sunflowers的score為0.79741

一些思考：我們剛才做的是5分類，分別是幾種花，如果我們現在有一張貓的圖片，這張圖片對模型資料來說是未標識的，也就是對未標識的物體進行預測會是什麼結果？
我們來試一下：
這裡寫圖片描述

可以看到，同樣也給出了分類預測的得分值，可是這隻貓當然不是蒲公英，這也是目前影象識別模型普遍存在的問題，也就是它不知道自己不知道。對人類而言，對於這5類花的預測分類，如果碰見這隻貓，我們會說這不是花，或者遇見一種不認識的不屬於這5類的我們會說我們不認識，或者不屬於這5類，但是對於模型而言，它目前做不到，它最終只會把這隻貓分到其中某一類花裡面去。

TensorFlow學習筆記（11）--【Ubuntu】slim框架下的inception_v4模型的執行、視覺化、匯出和使用

TensorFlow學習筆記（11）--【Ubuntu】slim框架下的inception_v4模型的執行、視覺化、匯出和使用

語音識別學習筆記（一）【概述】

MySQL學習筆記（三）【Leecode】

【TensorFlow學習筆記（一）】利用Anaconda安裝TensorFlow（windows系統）

tensorflow學習筆記（二）

Linux第一周學習筆記（11）

Linux第二周學習筆記（11）

TensorFlow學習筆記（6）讀取數據

SpringBoot學習筆記（11）：使用WebSocket構建交互式Web應用程序

TensorFlow學習筆記（2）----placeholder

莫煩大大TensorFlow學習筆記（3）----建立神經網絡

莫煩大大TensorFlow學習筆記（4）----分類問題

Tensorflow 學習筆記（一）mac os 安裝 tensorflow

TensorFlow學習筆記（四）

Tensorflow學習筆記（三）

ActiveMQ學習筆記（11）----ActiveMQ的動態網路連線

tensorflow學習筆記（1）：sess.run()

TensorFlow學習筆記（一）-- Softmax迴歸模型識別MNIST

語音學習筆記（四）【傳統聲學模型】

語音識別學習筆記（三）【動態時間歸正的識別技術】

TensorFlow學習筆記（11）--【Ubuntu】slim框架下的inception_v4模型的執行、視覺化、匯出和使用

相關推薦