1. 程式人生 > >神經網絡優化(二) - 學習率

神經網絡優化(二) - 學習率

4.5 數據表 http 傳播 border ntop 2.3 1.9 圖片

1 學習率的基本定義

學習率learning_rate:每次參數更新的幅度。

技術分享圖片

簡單示例:

假設損失函數 loss = ( w + 1 )2,則梯度為

技術分享圖片

參數 w 初始化為 5 ,學習率為 0.2 ,則

運行次數 參數w值 計算
1次 5 5-0.2*(2*5+2) = 2.6
2次 2.6 2.6-0.2*(2*2.6+2) = 1.16
3次 1.16 1.16-0.2*(2*1.16+2) = 0.296
4次 0.296

2 學習率的初步應用

2.1 學習率 0.2 時

# 已知損失函數loss = (w+1)^2,待優化參數W的初值為5
# 求loss最小時對應的W值 # 第一步 引入庫,生成數據表 import tensorflow as tf # 第二步 定義前向傳播 # 定義待優化參數w初值賦5 w = tf.Variable(tf.constant(5, dtype=tf.float32)) # 第三步 定義損失函數和反向傳播 # 定義損失函數loss loss = tf.square(w+1) # 定義反向傳播方法,學習率為0.2 train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss) # 第四步 生成會話,訓練40輪
with tf.Session() as sess: init_op = tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) w_val = sess.run(w) loss_val = sess.run(loss) print("After %s steps: w is %f, loss is %f." % (i, w_val, loss_val))

運行

After 0 steps: w is 2.600000,   loss is 12.959999.
After 1 steps: w is 1.160000,   loss is 4.665599.
After 2 steps: w is 0.296000,   loss is 1.679616.
After 3 steps: w is -0.222400,   loss is 0.604662.
After 4 steps: w is -0.533440,   loss is 0.217678.
After 5 steps: w is -0.720064,   loss is 0.078364.
After 6 steps: w is -0.832038,   loss is 0.028211.
After 7 steps: w is -0.899223,   loss is 0.010156.
After 8 steps: w is -0.939534,   loss is 0.003656.
After 9 steps: w is -0.963720,   loss is 0.001316.
After 10 steps: w is -0.978232,   loss is 0.000474.
After 11 steps: w is -0.986939,   loss is 0.000171.
After 12 steps: w is -0.992164,   loss is 0.000061.
After 13 steps: w is -0.995298,   loss is 0.000022.
After 14 steps: w is -0.997179,   loss is 0.000008.
After 15 steps: w is -0.998307,   loss is 0.000003.
After 16 steps: w is -0.998984,   loss is 0.000001.
After 17 steps: w is -0.999391,   loss is 0.000000.
After 18 steps: w is -0.999634,   loss is 0.000000.
After 19 steps: w is -0.999781,   loss is 0.000000.
After 20 steps: w is -0.999868,   loss is 0.000000.
After 21 steps: w is -0.999921,   loss is 0.000000.
After 22 steps: w is -0.999953,   loss is 0.000000.
After 23 steps: w is -0.999972,   loss is 0.000000.
After 24 steps: w is -0.999983,   loss is 0.000000.
After 25 steps: w is -0.999990,   loss is 0.000000.
After 26 steps: w is -0.999994,   loss is 0.000000.
After 27 steps: w is -0.999996,   loss is 0.000000.
After 28 steps: w is -0.999998,   loss is 0.000000.
After 29 steps: w is -0.999999,   loss is 0.000000.
After 30 steps: w is -0.999999,   loss is 0.000000.
After 31 steps: w is -1.000000,   loss is 0.000000.
After 32 steps: w is -1.000000,   loss is 0.000000.
After 33 steps: w is -1.000000,   loss is 0.000000.
After 34 steps: w is -1.000000,   loss is 0.000000.
After 35 steps: w is -1.000000,   loss is 0.000000.
After 36 steps: w is -1.000000,   loss is 0.000000.
After 37 steps: w is -1.000000,   loss is 0.000000.
After 38 steps: w is -1.000000,   loss is 0.000000.
After 39 steps: w is -1.000000,   loss is 0.000000.

從運算過程可以得出,待優化參數 w 由原來的初賦值參數 5 最終在 after 31 step 優化成 -1,直到 after 39 step 一直保持 -1。

2.2 學習率為 1 時

# 定義反向傳播方法,學習率為 1 
train_step = tf.train.GradientDescentOptimizer(1).minimize(loss)

運行

After 0 steps: w is -7.000000,   loss is 36.000000.
After 1 steps: w is 5.000000,   loss is 36.000000.
After 2 steps: w is -7.000000,   loss is 36.000000.
After 3 steps: w is 5.000000,   loss is 36.000000.
After 4 steps: w is -7.000000,   loss is 36.000000.
After 5 steps: w is 5.000000,   loss is 36.000000.
After 6 steps: w is -7.000000,   loss is 36.000000.
After 7 steps: w is 5.000000,   loss is 36.000000.
After 8 steps: w is -7.000000,   loss is 36.000000.
After 9 steps: w is 5.000000,   loss is 36.000000.
After 10 steps: w is -7.000000,   loss is 36.000000.
After 11 steps: w is 5.000000,   loss is 36.000000.
After 12 steps: w is -7.000000,   loss is 36.000000.
After 13 steps: w is 5.000000,   loss is 36.000000.
After 14 steps: w is -7.000000,   loss is 36.000000.
After 15 steps: w is 5.000000,   loss is 36.000000.
After 16 steps: w is -7.000000,   loss is 36.000000.
After 17 steps: w is 5.000000,   loss is 36.000000.
After 18 steps: w is -7.000000,   loss is 36.000000.
After 19 steps: w is 5.000000,   loss is 36.000000.
After 20 steps: w is -7.000000,   loss is 36.000000.
After 21 steps: w is 5.000000,   loss is 36.000000.
After 22 steps: w is -7.000000,   loss is 36.000000.
After 23 steps: w is 5.000000,   loss is 36.000000.
After 24 steps: w is -7.000000,   loss is 36.000000.
After 25 steps: w is 5.000000,   loss is 36.000000.
After 26 steps: w is -7.000000,   loss is 36.000000.
After 27 steps: w is 5.000000,   loss is 36.000000.
After 28 steps: w is -7.000000,   loss is 36.000000.
After 29 steps: w is 5.000000,   loss is 36.000000.
After 30 steps: w is -7.000000,   loss is 36.000000.
After 31 steps: w is 5.000000,   loss is 36.000000.
After 32 steps: w is -7.000000,   loss is 36.000000.
After 33 steps: w is 5.000000,   loss is 36.000000.
After 34 steps: w is -7.000000,   loss is 36.000000.
After 35 steps: w is 5.000000,   loss is 36.000000.
After 36 steps: w is -7.000000,   loss is 36.000000.
After 37 steps: w is 5.000000,   loss is 36.000000.
After 38 steps: w is -7.000000,   loss is 36.000000.
After 39 steps: w is 5.000000,   loss is 36.000000.

當學習率過大時,結果並不能收斂,只是在來回震蕩,而實際上,有的值還越跳越大。

學習率過大,震蕩不收斂。

2.3 學習率為 0.001 時

# 定義反向傳播方法,學習率為 1 
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

運行

After 0 steps: w is 4.988000,   loss is 35.856144.
After 1 steps: w is 4.976024,   loss is 35.712864.
After 2 steps: w is 4.964072,   loss is 35.570156.
After 3 steps: w is 4.952144,   loss is 35.428020.
After 4 steps: w is 4.940240,   loss is 35.286449.
After 5 steps: w is 4.928360,   loss is 35.145447.
After 6 steps: w is 4.916503,   loss is 35.005009.
After 7 steps: w is 4.904670,   loss is 34.865124.
After 8 steps: w is 4.892860,   loss is 34.725803.
After 9 steps: w is 4.881075,   loss is 34.587044.
After 10 steps: w is 4.869313,   loss is 34.448833.
After 11 steps: w is 4.857574,   loss is 34.311172.
After 12 steps: w is 4.845859,   loss is 34.174068.
After 13 steps: w is 4.834167,   loss is 34.037510.
After 14 steps: w is 4.822499,   loss is 33.901497.
After 15 steps: w is 4.810854,   loss is 33.766029.
After 16 steps: w is 4.799233,   loss is 33.631104.
After 17 steps: w is 4.787634,   loss is 33.496712.
After 18 steps: w is 4.776059,   loss is 33.362858.
After 19 steps: w is 4.764507,   loss is 33.229538.
After 20 steps: w is 4.752978,   loss is 33.096756.
After 21 steps: w is 4.741472,   loss is 32.964497.
After 22 steps: w is 4.729989,   loss is 32.832775.
After 23 steps: w is 4.718529,   loss is 32.701576.
After 24 steps: w is 4.707092,   loss is 32.570904.
After 25 steps: w is 4.695678,   loss is 32.440750.
After 26 steps: w is 4.684287,   loss is 32.311119.
After 27 steps: w is 4.672918,   loss is 32.182003.
After 28 steps: w is 4.661572,   loss is 32.053402.
After 29 steps: w is 4.650249,   loss is 31.925320.
After 30 steps: w is 4.638949,   loss is 31.797745.
After 31 steps: w is 4.627671,   loss is 31.670683.
After 32 steps: w is 4.616416,   loss is 31.544128.
After 33 steps: w is 4.605183,   loss is 31.418077.
After 34 steps: w is 4.593973,   loss is 31.292530.
After 35 steps: w is 4.582785,   loss is 31.167484.
After 36 steps: w is 4.571619,   loss is 31.042938.
After 37 steps: w is 4.560476,   loss is 30.918892.
After 38 steps: w is 4.549355,   loss is 30.795341.
After 39 steps: w is 4.538256,   loss is 30.672281.

在學習率調整至 0.001 後,w值也收斂,只是變化太慢,在 after 39 step 後才從 5 --> 4.5 ,從第一例可知,最終的優化結果為 -1 , 很顯然,當學習率過低時,運行效率非常低。

學習率過小時,收斂速度太慢。

3 指數衰減學習率

3.1 指數衰減學習率數學釋義

指數衰減學習率是根據運行的輪數動態調整學習率。

具體多少輪更新一次學習率RATE_STEP = 總樣本數 / BATCH_SIZE

將總樣本數量分割成 N 個 BATCH_SIZE 參數餵入神經網絡訓練。每餵入 BATCH_SIZE 數據量循環一輪,當數據集的子集均已餵入神經網絡後,調整一次學習率。

一個神經網絡訓練多少次是設定好的,記為global_step

學習率更新次數 = global_step / RATE_STEP

技術分享圖片

備註:

learning_rate - 學習率更新後的值

RATE_BAST - 學習率基數,學習率初始值

RATE_DECAY - 學習率衰減率,一般值範圍( 0, 1 )

global_step - 運行總輪次數

rate_step - 多少輪更新一次學習率

3.2 指數衰減學習率 Tensorflow 代碼

global_step= tf.Variable(0, trainable = False)

由於這個變量只為計數,trainable = False 意味該數據不可訓練

神經網絡優化(二) - 學習率