caffe使用總結
文章目錄
由影象資料生成lmdb資料
loss = NaN問題: (1)學習率太高 (2)lmdb生成有問題,未將shuffle設定為true,導致NaN問題,雖然也可以通過降低學習率改善,但是變得難以訓練了,原因是生成的batch無法很好的估計整個資料集。
- create_list.sh
#!/usr/bin/env sh DATA="data/mnist.28x28" cd $DATA rm -f train.txt rm -f test.txt find train/0 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 0/" >> train.txt find train/1 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 1/" >> train.txt find train/2 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 2/" >> train.txt find train/3 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 3/" >> train.txt find train/4 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 4/" >> train.txt find train/5 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 5/" >> train.txt find train/6 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 6/" >> train.txt find train/7 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 7/" >> train.txt find train/8 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 8/" >> train.txt find train/9 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 9/" >> train.txt find test/0 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 0/" >> test.txt find test/1 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 1/" >> test.txt find test/2 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 2/" >> test.txt find test/3 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 3/" >> test.txt find test/4 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 4/" >> test.txt find test/5 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 5/" >> test.txt find test/6 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 6/" >> test.txt find test/7 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 7/" >> test.txt find test/8 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 8/" >> test.txt find test/9 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 9/" >> test.txt
- create_lmdb.sh
#!/usr/bin/env sh # This script converts the mnist data into lmdb/leveldb format, # depending on the value assigned to $BACKEND. set -e EXAMPLE=examples/mnist.28x28 DATA=data/mnist.28x28 BUILD=build/tools BACKEND="lmdb" echo "Creating ${BACKEND}..." rm -rf $EXAMPLE/mnist_train_${BACKEND} rm -rf $EXAMPLE/mnist_test_${BACKEND} $BUILD/convert_imageset -backend=$BACKEND -gray=true -shuffle=true $DATA/ $DATA/train.txt $EXAMPLE/mnist_train_${BACKEND} $BUILD/convert_imageset -backend=$BACKEND -gray=true -shuffle=true $DATA/ $DATA/test.txt $EXAMPLE/mnist_test_${BACKEND} echo "Done."
solver.prototxt【優化演算法引數的調整很重要】
基礎的學習率和動量過大會導致loss很大,甚至等於NaN
net:lenet_train_test.prototxt
test_iter:corvered_test_images_num / batch_size
parameters “test_iterations” and “batch size” in test depend on number of images in test database.
test_interval:訓練演算法每迭代test_interval次計算一次測試結果(間隔內測試資料的精度和損失)
base_lr:基礎的學習率
momentum:動量
momentum2:優化演算法的第二個引數,adam的第二引數
lr = base_lr * decay_factor
V(t+1) = momentum * V(t) - lr * g
lr_policy:學習率衰減策略
解釋來自caffe.proto
The learning rate decay policy. The currently implemented learning rate
policies are as follows:
fixed: always return base_lr.
- step: return base_lr * gamma ^ (floor(iter / step))
- exp: return base_lr * gamma ^ iter
inv: return base_lr * (1 + gamma * iter) ^ (- power)
- multistep: similar to step but it allows non uniform steps defined by
epvalue
- poly: the effective learning rate follows a polynomial decay, to be
zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
- sigmoid: the effective learning rate follows a sigmod decay
return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
here base_lr, max_iter, gamma, step, stepvalue and power are defined
in the solver parameter protocol buffer, and iter is the current iteration.
weight_decay: 0.0005,
regularization types supported: L1 and L2
The weight_decay parameter govern the regularization term of the neural net.
During training a regularization term is added to the network's loss to compute the backprop gradient. Theweight_decay value determines how dominant this regularization term will be in the gradient computation.
As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, large InnerProduct layers etc.) the higher this term should be.
Caffe also allows you to choose between L2 regularization (default) andL1 regularization, by setting
regularization_type: "L1"
While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.
---------------------
作者:susandebug
來源:CSDN
原文:https://blog.csdn.net/u010025211/article/details/50055815
版權宣告:本文為博主原創文章,轉載請附上博文連結!
display:間隔多少次迭代顯示一次訓練結果
max_iter:最大迭代次數
Parameters “maximum_iterations” and “batch size” in train depend on number of epochs you would like to train your net.
一個epoch的iter_num,即iter_num_per_epoch = training_images_num / batch_size
epochs_num = max_iter / iter_num_per_epoch
snapshot:設定快照的迭代次數間隔
snapshot_prefix:設定快照的字首
type:選擇優化演算法
solver_mode:CPU or GPU
train_test.prototxt
include phase train/test
deploy.prototxt
- Data–>Input
- 沒有反向傳播部分,softmaxwithloss --> softmax
- BN層的use_global_stats設定為true
命令列中使用caffe
訓練模型
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt
恢復中斷的模型訓練
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt --snapshot=examples/mnist/lenet_iter_1000.solverstate
finetuning、遷移學習、預訓練
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt --weights=examples/mnist/lenet_iter_100000.caffemodel
測試模型
./build/tools/caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel
或者
caffe test --model=examples/mnist/lenet_train_test.prototxt --weights=examples/mnist/lenet_iter_10000.caffemodel
視覺化
1、使用netscope進行視覺化
http://ethereon.github.io/netscope/quickstart.html
2、使用caffe自帶工具draw_net.py進行視覺化
自定義網路層
可以使用python layer
繪製loss和accuracy變化曲線
用caffe自帶的工具plot_training_log.py