1. 程式人生 > >caffe使用總結

caffe使用總結

文章目錄

由影象資料生成lmdb資料

loss = NaN問題:
(1)學習率太高
(2)lmdb生成有問題,未將shuffle設定為true,導致NaN問題,雖然也可以通過降低學習率改善,但是變得難以訓練了,原因是生成的batch無法很好的估計整個資料集。
  • create_list.sh
#!/usr/bin/env sh

DATA="data/mnist.28x28"

cd $DATA

rm -f train.txt
rm -f test.txt

find train/0 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 0/" >> train.txt
find train/1 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 1/" >> train.txt
find train/2 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 2/" >> train.txt
find train/3 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 3/" >> train.txt
find train/4 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 4/" >> train.txt
find train/5 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 5/" >> train.txt
find train/6 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 6/" >> train.txt
find train/7 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 7/" >> train.txt
find train/8 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 8/" >> train.txt
find train/9 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 9/" >> train.txt


find test/0 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 0/" >> test.txt
find test/1 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 1/" >> test.txt
find test/2 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 2/" >> test.txt
find test/3 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 3/" >> test.txt
find test/4 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 4/" >> test.txt
find test/5 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 5/" >> test.txt
find test/6 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 6/" >> test.txt
find test/7 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 7/" >> test.txt
find test/8 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 8/" >> test.txt
find test/9 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 9/" >> test.txt
  • create_lmdb.sh
#!/usr/bin/env sh
# This script converts the mnist data into lmdb/leveldb format,
# depending on the value assigned to $BACKEND.
set -e

EXAMPLE=examples/mnist.28x28
DATA=data/mnist.28x28
BUILD=build/tools

BACKEND="lmdb"

echo "Creating ${BACKEND}..."

rm -rf $EXAMPLE/mnist_train_${BACKEND}
rm -rf $EXAMPLE/mnist_test_${BACKEND}

$BUILD/convert_imageset -backend=$BACKEND -gray=true -shuffle=true $DATA/ $DATA/train.txt  $EXAMPLE/mnist_train_${BACKEND}
$BUILD/convert_imageset -backend=$BACKEND -gray=true -shuffle=true $DATA/ $DATA/test.txt  $EXAMPLE/mnist_test_${BACKEND}

echo "Done."

solver.prototxt【優化演算法引數的調整很重要】

基礎的學習率和動量過大會導致loss很大,甚至等於NaN
net:lenet_train_test.prototxt
test_iter:corvered_test_images_num / batch_size
parameters “test_iterations” and “batch size” in test depend on number of images in test database.
test_interval:訓練演算法每迭代test_interval次計算一次測試結果(間隔內測試資料的精度和損失)

base_lr:基礎的學習率
momentum:動量
momentum2:優化演算法的第二個引數,adam的第二引數

lr = base_lr * decay_factor
V(t+1) = momentum * V(t) - lr * g

lr_policy:學習率衰減策略
解釋來自caffe.proto

 The learning rate decay policy. The currently implemented learning rate
policies are as follows:
 fixed: always return base_lr.
  - step: return base_lr * gamma ^ (floor(iter / step))
   - exp: return base_lr * gamma ^ iter
    inv: return base_lr * (1 + gamma * iter) ^ (- power)
   - multistep: similar to step but it allows non uniform steps defined by
      epvalue
   - poly: the effective learning rate follows a polynomial decay, to be
      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
   - sigmoid: the effective learning rate follows a sigmod decay
      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
here base_lr, max_iter, gamma, step, stepvalue and power are defined
 in the solver parameter protocol buffer, and iter is the current iteration.

weight_decay: 0.0005,
regularization types supported: L1 and L2

The weight_decay parameter govern the regularization term of the neural net.

During training a regularization term is added to the network's loss to compute the backprop gradient. Theweight_decay value determines how dominant this regularization term will be in the gradient computation.

As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, large InnerProduct layers etc.) the higher this term should be.

Caffe also allows you to choose between L2 regularization (default) andL1 regularization, by setting

regularization_type: "L1"
While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.


--------------------- 
作者:susandebug 
來源:CSDN 
原文:https://blog.csdn.net/u010025211/article/details/50055815 
版權宣告:本文為博主原創文章,轉載請附上博文連結!

display:間隔多少次迭代顯示一次訓練結果
max_iter:最大迭代次數
Parameters “maximum_iterations” and “batch size” in train depend on number of epochs you would like to train your net.

一個epoch的iter_num,即iter_num_per_epoch = training_images_num / batch_size

epochs_num = max_iter / iter_num_per_epoch

snapshot:設定快照的迭代次數間隔
snapshot_prefix:設定快照的字首
type:選擇優化演算法
solver_mode:CPU or GPU

train_test.prototxt

include phase train/test

deploy.prototxt

  • Data–>Input
  • 沒有反向傳播部分,softmaxwithloss --> softmax
  • BN層的use_global_stats設定為true

命令列中使用caffe

訓練模型

./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt

恢復中斷的模型訓練

./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt --snapshot=examples/mnist/lenet_iter_1000.solverstate

finetuning、遷移學習、預訓練

./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt --weights=examples/mnist/lenet_iter_100000.caffemodel

測試模型

./build/tools/caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel
或者
caffe test --model=examples/mnist/lenet_train_test.prototxt --weights=examples/mnist/lenet_iter_10000.caffemodel

視覺化

1、使用netscope進行視覺化
http://ethereon.github.io/netscope/quickstart.html

2、使用caffe自帶工具draw_net.py進行視覺化

自定義網路層

可以使用python layer

繪製loss和accuracy變化曲線

用caffe自帶的工具plot_training_log.py