神經網絡優化(二) - 學習率
阿新 • • 發佈:2018-10-30
4.5 數據表 http 傳播 border ntop 2.3 1.9 圖片
1 學習率的基本定義
學習率learning_rate:每次參數更新的幅度。
簡單示例:
假設損失函數 loss = ( w + 1 )2,則梯度為
參數 w 初始化為 5 ,學習率為 0.2 ,則
運行次數 | 參數w值 | 計算 |
1次 | 5 | 5-0.2*(2*5+2) = 2.6 |
2次 | 2.6 | 2.6-0.2*(2*2.6+2) = 1.16 |
3次 | 1.16 | 1.16-0.2*(2*1.16+2) = 0.296 |
4次 | 0.296 |
2 學習率的初步應用
2.1 學習率 0.2 時
# 已知損失函數loss = (w+1)^2,待優化參數W的初值為5# 求loss最小時對應的W值 # 第一步 引入庫,生成數據表 import tensorflow as tf # 第二步 定義前向傳播 # 定義待優化參數w初值賦5 w = tf.Variable(tf.constant(5, dtype=tf.float32)) # 第三步 定義損失函數和反向傳播 # 定義損失函數loss loss = tf.square(w+1) # 定義反向傳播方法,學習率為0.2 train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss) # 第四步 生成會話,訓練40輪with tf.Session() as sess: init_op = tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) w_val = sess.run(w) loss_val = sess.run(loss) print("After %s steps: w is %f, loss is %f." % (i, w_val, loss_val))
運行
After 0 steps: w is 2.600000, loss is 12.959999. After 1 steps: w is 1.160000, loss is 4.665599. After 2 steps: w is 0.296000, loss is 1.679616. After 3 steps: w is -0.222400, loss is 0.604662. After 4 steps: w is -0.533440, loss is 0.217678. After 5 steps: w is -0.720064, loss is 0.078364. After 6 steps: w is -0.832038, loss is 0.028211. After 7 steps: w is -0.899223, loss is 0.010156. After 8 steps: w is -0.939534, loss is 0.003656. After 9 steps: w is -0.963720, loss is 0.001316. After 10 steps: w is -0.978232, loss is 0.000474. After 11 steps: w is -0.986939, loss is 0.000171. After 12 steps: w is -0.992164, loss is 0.000061. After 13 steps: w is -0.995298, loss is 0.000022. After 14 steps: w is -0.997179, loss is 0.000008. After 15 steps: w is -0.998307, loss is 0.000003. After 16 steps: w is -0.998984, loss is 0.000001. After 17 steps: w is -0.999391, loss is 0.000000. After 18 steps: w is -0.999634, loss is 0.000000. After 19 steps: w is -0.999781, loss is 0.000000. After 20 steps: w is -0.999868, loss is 0.000000. After 21 steps: w is -0.999921, loss is 0.000000. After 22 steps: w is -0.999953, loss is 0.000000. After 23 steps: w is -0.999972, loss is 0.000000. After 24 steps: w is -0.999983, loss is 0.000000. After 25 steps: w is -0.999990, loss is 0.000000. After 26 steps: w is -0.999994, loss is 0.000000. After 27 steps: w is -0.999996, loss is 0.000000. After 28 steps: w is -0.999998, loss is 0.000000. After 29 steps: w is -0.999999, loss is 0.000000. After 30 steps: w is -0.999999, loss is 0.000000. After 31 steps: w is -1.000000, loss is 0.000000. After 32 steps: w is -1.000000, loss is 0.000000. After 33 steps: w is -1.000000, loss is 0.000000. After 34 steps: w is -1.000000, loss is 0.000000. After 35 steps: w is -1.000000, loss is 0.000000. After 36 steps: w is -1.000000, loss is 0.000000. After 37 steps: w is -1.000000, loss is 0.000000. After 38 steps: w is -1.000000, loss is 0.000000. After 39 steps: w is -1.000000, loss is 0.000000.
從運算過程可以得出,待優化參數 w 由原來的初賦值參數 5 最終在 after 31 step 優化成 -1,直到 after 39 step 一直保持 -1。
2.2 學習率為 1 時
# 定義反向傳播方法,學習率為 1 train_step = tf.train.GradientDescentOptimizer(1).minimize(loss)
運行
After 0 steps: w is -7.000000, loss is 36.000000. After 1 steps: w is 5.000000, loss is 36.000000. After 2 steps: w is -7.000000, loss is 36.000000. After 3 steps: w is 5.000000, loss is 36.000000. After 4 steps: w is -7.000000, loss is 36.000000. After 5 steps: w is 5.000000, loss is 36.000000. After 6 steps: w is -7.000000, loss is 36.000000. After 7 steps: w is 5.000000, loss is 36.000000. After 8 steps: w is -7.000000, loss is 36.000000. After 9 steps: w is 5.000000, loss is 36.000000. After 10 steps: w is -7.000000, loss is 36.000000. After 11 steps: w is 5.000000, loss is 36.000000. After 12 steps: w is -7.000000, loss is 36.000000. After 13 steps: w is 5.000000, loss is 36.000000. After 14 steps: w is -7.000000, loss is 36.000000. After 15 steps: w is 5.000000, loss is 36.000000. After 16 steps: w is -7.000000, loss is 36.000000. After 17 steps: w is 5.000000, loss is 36.000000. After 18 steps: w is -7.000000, loss is 36.000000. After 19 steps: w is 5.000000, loss is 36.000000. After 20 steps: w is -7.000000, loss is 36.000000. After 21 steps: w is 5.000000, loss is 36.000000. After 22 steps: w is -7.000000, loss is 36.000000. After 23 steps: w is 5.000000, loss is 36.000000. After 24 steps: w is -7.000000, loss is 36.000000. After 25 steps: w is 5.000000, loss is 36.000000. After 26 steps: w is -7.000000, loss is 36.000000. After 27 steps: w is 5.000000, loss is 36.000000. After 28 steps: w is -7.000000, loss is 36.000000. After 29 steps: w is 5.000000, loss is 36.000000. After 30 steps: w is -7.000000, loss is 36.000000. After 31 steps: w is 5.000000, loss is 36.000000. After 32 steps: w is -7.000000, loss is 36.000000. After 33 steps: w is 5.000000, loss is 36.000000. After 34 steps: w is -7.000000, loss is 36.000000. After 35 steps: w is 5.000000, loss is 36.000000. After 36 steps: w is -7.000000, loss is 36.000000. After 37 steps: w is 5.000000, loss is 36.000000. After 38 steps: w is -7.000000, loss is 36.000000. After 39 steps: w is 5.000000, loss is 36.000000.
當學習率過大時,結果並不能收斂,只是在來回震蕩,而實際上,有的值還越跳越大。
學習率過大,震蕩不收斂。
2.3 學習率為 0.001 時
# 定義反向傳播方法,學習率為 1 train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
運行
After 0 steps: w is 4.988000, loss is 35.856144. After 1 steps: w is 4.976024, loss is 35.712864. After 2 steps: w is 4.964072, loss is 35.570156. After 3 steps: w is 4.952144, loss is 35.428020. After 4 steps: w is 4.940240, loss is 35.286449. After 5 steps: w is 4.928360, loss is 35.145447. After 6 steps: w is 4.916503, loss is 35.005009. After 7 steps: w is 4.904670, loss is 34.865124. After 8 steps: w is 4.892860, loss is 34.725803. After 9 steps: w is 4.881075, loss is 34.587044. After 10 steps: w is 4.869313, loss is 34.448833. After 11 steps: w is 4.857574, loss is 34.311172. After 12 steps: w is 4.845859, loss is 34.174068. After 13 steps: w is 4.834167, loss is 34.037510. After 14 steps: w is 4.822499, loss is 33.901497. After 15 steps: w is 4.810854, loss is 33.766029. After 16 steps: w is 4.799233, loss is 33.631104. After 17 steps: w is 4.787634, loss is 33.496712. After 18 steps: w is 4.776059, loss is 33.362858. After 19 steps: w is 4.764507, loss is 33.229538. After 20 steps: w is 4.752978, loss is 33.096756. After 21 steps: w is 4.741472, loss is 32.964497. After 22 steps: w is 4.729989, loss is 32.832775. After 23 steps: w is 4.718529, loss is 32.701576. After 24 steps: w is 4.707092, loss is 32.570904. After 25 steps: w is 4.695678, loss is 32.440750. After 26 steps: w is 4.684287, loss is 32.311119. After 27 steps: w is 4.672918, loss is 32.182003. After 28 steps: w is 4.661572, loss is 32.053402. After 29 steps: w is 4.650249, loss is 31.925320. After 30 steps: w is 4.638949, loss is 31.797745. After 31 steps: w is 4.627671, loss is 31.670683. After 32 steps: w is 4.616416, loss is 31.544128. After 33 steps: w is 4.605183, loss is 31.418077. After 34 steps: w is 4.593973, loss is 31.292530. After 35 steps: w is 4.582785, loss is 31.167484. After 36 steps: w is 4.571619, loss is 31.042938. After 37 steps: w is 4.560476, loss is 30.918892. After 38 steps: w is 4.549355, loss is 30.795341. After 39 steps: w is 4.538256, loss is 30.672281.
在學習率調整至 0.001 後,w值也收斂,只是變化太慢,在 after 39 step 後才從 5 --> 4.5 ,從第一例可知,最終的優化結果為 -1 , 很顯然,當學習率過低時,運行效率非常低。
學習率過小時,收斂速度太慢。
3 指數衰減學習率
3.1 指數衰減學習率數學釋義
指數衰減學習率是根據運行的輪數動態調整學習率。
具體多少輪更新一次學習率RATE_STEP = 總樣本數 / BATCH_SIZE
將總樣本數量分割成 N 個 BATCH_SIZE 參數餵入神經網絡訓練。每餵入 BATCH_SIZE 數據量循環一輪,當數據集的子集均已餵入神經網絡後,調整一次學習率。
一個神經網絡訓練多少次是設定好的,記為global_step
學習率更新次數 = global_step / RATE_STEP
備註:
learning_rate - 學習率更新後的值
RATE_BAST - 學習率基數,學習率初始值
RATE_DECAY - 學習率衰減率,一般值範圍( 0, 1 )
global_step - 運行總輪次數
rate_step - 多少輪更新一次學習率
3.2 指數衰減學習率 Tensorflow 代碼
global_step= tf.Variable(0, trainable = False)
由於這個變量只為計數,trainable = False 意味該數據不可訓練
神經網絡優化(二) - 學習率