1. 程式人生 > >深度學習框架tensorflow學習與應用6(優化器SGD、ADAM、Adadelta、Momentum、RMSProp比較)

深度學習框架tensorflow學習與應用6(優化器SGD、ADAM、Adadelta、Momentum、RMSProp比較)

看到一個圖片,就是那個表情包,大家都知道:

Adadelta  》  NAG 》 Momentum》 Remsprop 》Adagrad 》SGD

但是我覺得看情況而定,比如有http://blog.51cto.com/12568470/1898367常見優化演算法 (tensorflow對應引數)就認為實際工作上實踐中覺得是ADAM ,但是誰說的準呢是吧,每個工程師的場景不一樣,得到的實踐的經驗也不一樣,也說不準呢。

所以有的人建議:除錯時用快的優化器去訓練,等發論文時,所有的優化器都嘗試一次,取最好的效果就好。

 

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# 載入資料集
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

# 每個批次的大小
batch_size = 100
# 計算一共有多少個批次
n_batch = mnist.train.num_examples // batch_size

# 定義兩個placeholder
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

# 建立一個簡單的神經網路
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
prediction = tf.nn.softmax(tf.matmul(x, W) + b)

# 二次代價函式
# loss = tf.reduce_mean(tf.square(y-prediction))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))
# 使用梯度下降法
# train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
# 初始化變數
init = tf.global_variables_initializer()

# 結果存放在一個布林型列表中
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(prediction, 1))  # argmax返回一維張量中最大的值所在的位置
# 求準確率
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(21):
        for batch in range(n_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys})

        acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
        print("Iter " + str(epoch) + ",Testing Accuracy " + str(acc))
  • SGD

optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)

Iter 0,Testing Accuracy 0.825
Iter 1,Testing Accuracy 0.8798
Iter 2,Testing Accuracy 0.8994
Iter 3,Testing Accuracy 0.9047
Iter 4,Testing Accuracy 0.9076
Iter 5,Testing Accuracy 0.9104
Iter 6,Testing Accuracy 0.9121
Iter 7,Testing Accuracy 0.9127
Iter 8,Testing Accuracy 0.9147
Iter 9,Testing Accuracy 0.9166
Iter 10,Testing Accuracy 0.9174
Iter 11,Testing Accuracy 0.9167
Iter 12,Testing Accuracy 0.9183
Iter 13,Testing Accuracy 0.9183
Iter 14,Testing Accuracy 0.9202
Iter 15,Testing Accuracy 0.9197 Iter 16,Testing Accuracy 0.9204 Iter 17,Testing Accuracy 0.9213 Iter 18,Testing Accuracy 0.921 Iter 19,Testing Accuracy 0.9213 Iter 20,Testing Accuracy 0.9217
  • ADAM

   optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, epsilon=1e-08)

train_step_AdamOptimizer = tf.train.AdamOptimizer(1e-3).minimize(loss)

Iter 0,Testing Accuracy 0.8991
Iter 1,Testing Accuracy 0.9109
Iter 2,Testing Accuracy 0.916
Iter 3,Testing Accuracy 0.9199
Iter 4,Testing Accuracy 0.9229
Iter 5,Testing Accuracy 0.9247
Iter 6,Testing Accuracy 0.926
Iter 7,Testing Accuracy 0.9276
Iter 8,Testing Accuracy 0.928
Iter 9,Testing Accuracy 0.928
Iter 10,Testing Accuracy 0.9289
Iter 11,Testing Accuracy 0.9297
Iter 12,Testing Accuracy 0.9308
Iter 13,Testing Accuracy 0.93
Iter 14,Testing Accuracy 0.9297
Iter 15,Testing Accuracy 0.9304
Iter 16,Testing Accuracy 0.9301
Iter 17,Testing Accuracy 0.9318
Iter 18,Testing Accuracy 0.9306
Iter 19,Testing Accuracy 0.9315
Iter 20,Testing Accuracy 0.9317

train_step_AdadeltaOptimizer = tf.train.AdadeltaOptimizer(1e-3).minimize(loss)

Iter 0,Testing Accuracy 0.6779
Iter 1,Testing Accuracy 0.6761
Iter 2,Testing Accuracy 0.6757
Iter 3,Testing Accuracy 0.6762
Iter 4,Testing Accuracy 0.6775
Iter 5,Testing Accuracy 0.6785
Iter 6,Testing Accuracy 0.6791
Iter 7,Testing Accuracy 0.682
Iter 8,Testing Accuracy 0.6854
Iter 9,Testing Accuracy 0.6878
Iter 10,Testing Accuracy 0.6896
Iter 11,Testing Accuracy 0.6919
Iter 12,Testing Accuracy 0.6941
Iter 13,Testing Accuracy 0.6961
Iter 14,Testing Accuracy 0.6973
Iter 15,Testing Accuracy 0.6983
Iter 16,Testing Accuracy 0.6996
Iter 17,Testing Accuracy 0.7001
Iter 18,Testing Accuracy 0.7013
Iter 19,Testing Accuracy 0.7017
Iter 20,Testing Accuracy 0.7023

train_step_AdadeltaOptimizer = tf.train.AdadeltaOptimizer(1).minimize(loss)

Iter 0,Testing Accuracy 0.874
Iter 1,Testing Accuracy 0.8949
Iter 2,Testing Accuracy 0.905
Iter 3,Testing Accuracy 0.9075
Iter 4,Testing Accuracy 0.9102
Iter 5,Testing Accuracy 0.9125
Iter 6,Testing Accuracy 0.9141
Iter 7,Testing Accuracy 0.9154
Iter 8,Testing Accuracy 0.9173
Iter 9,Testing Accuracy 0.9182
Iter 10,Testing Accuracy 0.919
Iter 11,Testing Accuracy 0.9204
Iter 12,Testing Accuracy 0.9213
Iter 13,Testing Accuracy 0.9213
Iter 14,Testing Accuracy 0.9222
Iter 15,Testing Accuracy 0.9228
Iter 16,Testing Accuracy 0.9226
Iter 17,Testing Accuracy 0.9226
Iter 18,Testing Accuracy 0.9232
Iter 19,Testing Accuracy 0.9239
Iter 20,Testing Accuracy 0.9237

上面的 AdadeltaOptimizer 學習率來看就知道啦,還是要找到最合適的學習率才行,要不然還是沒啥效果,理論歸整是理論。

AdadeltaOptimizertf.train包中的一個優化器,它可以自動調整學習率。最開始的時候我看到這個優化器覺得很厲害的一個,結果使用後發現loss根本不下降,本來還以為是用法用錯了呢,幾經周折,最後才發現是學習率設定的問題。

 # set optimizer
optimizer = tf.train.AdadeltaOptimizer()
# set train_op
train_op = slim.learning.create_train_op(loss, optimizer)

AdadeltaOptimizer優化器預設的learning_rate=0.001,非常的小,導致梯度下降速度非常慢,最後的解決方案是:提高學習率

 # set optimizer
optimizer = tf.train.AdadeltaOptimizer(learning_rate=1)
# set train_op
train_op = slim.learning.create_train_op(loss, optimizer)

 http://data-science.vip/2017/12/18/TensorFlow%E7%BC%96%E7%A8%8B%E4%B8%AD%E8%B8%A9%E8%BF%87%E7%9A%84%E5%9D%91(1).html  TensorFlow程式設計中踩過的坑(1)

  • Momentum
    train_step_MomentumOptimizer = tf.train.MomentumOptimizer(1e-3, 0.9).minimize(loss)

Iter 0,Testing Accuracy 0.548
Iter 1,Testing Accuracy 0.6094
Iter 2,Testing Accuracy 0.7094
Iter 3,Testing Accuracy 0.7726
Iter 4,Testing Accuracy 0.791
Iter 5,Testing Accuracy 0.7964
Iter 6,Testing Accuracy 0.8015
Iter 7,Testing Accuracy 0.8056
Iter 8,Testing Accuracy 0.8081
Iter 9,Testing Accuracy 0.811
Iter 10,Testing Accuracy 0.8137
Iter 11,Testing Accuracy 0.8162
Iter 12,Testing Accuracy 0.8184
Iter 13,Testing Accuracy 0.8193
Iter 14,Testing Accuracy 0.8198
Iter 15,Testing Accuracy 0.821
Iter 16,Testing Accuracy 0.8226
Iter 17,Testing Accuracy 0.8236
Iter 18,Testing Accuracy 0.8243
Iter 19,Testing Accuracy 0.825
Iter 20,Testing Accuracy 0.8259

  • RMSProp

    train_step_RMSPropOptimizer = tf.train.RMSPropOptimizer(0.003, 0.9).minimize(loss)

Iter 0,Testing Accuracy 0.9158
Iter 1,Testing Accuracy 0.9216
Iter 2,Testing Accuracy 0.9263
Iter 3,Testing Accuracy 0.9274
Iter 4,Testing Accuracy 0.9275
Iter 5,Testing Accuracy 0.9319
Iter 6,Testing Accuracy 0.9309
Iter 7,Testing Accuracy 0.9286
Iter 8,Testing Accuracy 0.9303
Iter 9,Testing Accuracy 0.9305
Iter 10,Testing Accuracy 0.9316
Iter 11,Testing Accuracy 0.9318
Iter 12,Testing Accuracy 0.933
Iter 13,Testing Accuracy 0.9316
Iter 14,Testing Accuracy 0.9327
Iter 15,Testing Accuracy 0.9315
Iter 16,Testing Accuracy 0.9307
Iter 17,Testing Accuracy 0.9327
Iter 18,Testing Accuracy 0.9332
Iter 19,Testing Accuracy 0.9324
Iter 20,Testing Accuracy 0.9316

  • 實際經驗

ADAM通常會取得比較好的結果,同時收斂非常快相比SGD L-BFGS適用於全batch做優化的情況, 有時候可以多種優化方法同時使用,比如使用SGD進行warm up,然後ADAM 對於比較奇怪的需求,deepbit兩個loss的收斂需要進行控制的情況,比較慢的SGD比較適用。

 

tensorflow 不同優化演算法對應的引數

SGD

optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)

Momentum

optimizer = tf.train.MomentumOptimizer(lr, 0.9)

AdaGrad

optimizer = tf.train.AdagradientOptimizer(learning_rate=self.learning_rate)

RMSProp

optimizer = tf.train.RMSPropOptimizer(0.001, 0.9)

ADAM

optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, epsilon=1e-08)

部分區域性引數需要查詢tensorflow官方文件