1. 程式人生 > >tensorflow中optimizer自動訓練簡介和選擇訓練variable的方法

tensorflow中optimizer自動訓練簡介和選擇訓練variable的方法

本文主要介紹tensorflow的自動訓練的相關細節,並把自動訓練和基礎公式結合起來。如有不足,還請指教。

寫這個的初衷:有些教程說的比較模糊,沒體現出用意和特性。

面向物件:稍微瞭解點程式碼,又因為有限的教程講解比較模糊而一知半解的人。

原始碼

train的過程其實就是修改計算圖中的tf.Variable的過程,可以認為這些所有variable都是權重,下面這個例子沒引入placeholder和x,沒有x和w的區分,但是變數prediction_to_train=3其實等價於一個w=3和x=1,w怎麼變prediction=w都成立,所以x=1。這裡loss定義的是方差,label是1,所以訓練過程就是x=1,y=1的資料,針對初始化w=3,訓練w,把w變成1。

import tensorflow as tf

#define variable and error
label = tf.constant(1,dtype = tf.float32)
prediction_to_train = tf.Variable(3,dtype=tf.float32)

#define losses and train
manual_compute_loss = tf.square(prediction_to_train - label)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train_step = optimizer.minimize(manual_compute_loss)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for _ in range(100):
        print('variable is ', sess.run(prediction_to_train), ' and the loss is ',sess.run(manual_compute_loss))
        sess.run(train_step)

輸出

variable is  3.0  and the loss is  4.0
variable is  2.96  and the loss is  3.8416002
variable is  2.9208  and the loss is  3.6894724
variable is  2.882384  and the loss is  3.5433698
variable is  2.8447363  and the loss is  3.403052
variable is  2.8078415  and the loss is  3.268291

。。。。。。。
。。。
variable is  2.0062745  and the loss is  1.0125883
variable is  1.986149  and the loss is  0.9724898
variable is  1.966426  and the loss is  0.9339792

。。。。
。。。

variable is  1.0000029  and the loss is  8.185452e-12
variable is  1.0000029  and the loss is  8.185452e-12
variable is  1.0000029  and the loss is  8.185452e-12
variable is  1.0000029  and the loss is  8.185452e-12
variable is  1.0000029  and the loss is  8.185452e-12

限定train的Variable的方法:

根據train是修改計算圖中tf.Variable(預設是計算圖中所有tf.Variable,可以通過var_list指定)的事實,可以使用tf.constant或者python變數的形式來規避常量被訓練,這也是遷移學習要用到的技巧。

下邊是一個正經的陳(train)一發的例子:

#demo2
#define variable and error
label = tf.constant(1,dtype = tf.float32)
x = tf.placeholder(dtype = tf.float32)
w1 = tf.Variable(4,dtype=tf.float32)
w2 = tf.Variable(4,dtype=tf.float32)
w3 = tf.constant(4,dtype=tf.float32)

y_predict = w1*x+w2*x+w3*x

#define losses and train
make_up_loss = tf.square(y_predict - label)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train_step = optimizer.minimize(make_up_loss)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for _ in range(100):
        w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
        print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
        sess.run(train_step,{x:1})

 因為w3是constant,成功避免了被陳(train)一發,只有w1和w2被train。

variable is w1: -1.3765227  w2: -1.3765227  w3: 4.0  and the loss is  0.06098667
variable is w1: -1.3814617  w2: -1.3814617  w3: 4.0  and the loss is  0.056205332
variable is w1: -1.3862033  w2: -1.3862033  w3: 4.0  and the loss is  0.051798765
variable is w1: -1.3907552  w2: -1.3907552  w3: 4.0  and the loss is  0.047737725
variable is w1: -1.3951249  w2: -1.3951249  w3: 4.0  and the loss is  0.043995135
variable is w1: -1.3993199  w2: -1.3993199  w3: 4.0  and the loss is  0.04054594
variable is w1: -1.4033471  w2: -1.4033471  w3: 4.0  and the loss is  0.03736715

下邊是使用var_list限制只有w2被train的例子,只有w2被train,又因為那兩個w初始化都是1,x=1,所以w2接近-7是正確答案。

#define variable and error
label = tf.constant(1,dtype = tf.float32)
x = tf.placeholder(dtype = tf.float32)
w1 = tf.Variable(4,dtype=tf.float32)
w2 = tf.Variable(4,dtype=tf.float32)
w3 = tf.constant(4,dtype=tf.float32)

y_predict = w1*x+w2*x+w3*x

#define losses and train
make_up_loss = tf.square(y_predict - label)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train_step = optimizer.minimize(make_up_loss,var_list = w2)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for _ in range(500):
        w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
        print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
        sess.run(train_step,{x:1})
variable is w1: 4.0  w2: -6.99948  w3: 4.0  and the loss is  2.7063857e-07
variable is w1: 4.0  w2: -6.9994903  w3: 4.0  and the loss is  2.5983377e-07
variable is w1: 4.0  w2: -6.9995003  w3: 4.0  and the loss is  2.4972542e-07
variable is w1: 4.0  w2: -6.9995103  w3: 4.0  and the loss is  2.398176e-07
variable is w1: 4.0  w2: -6.9995203  w3: 4.0  and the loss is  2.3011035e-07
variable is w1: 4.0  w2: -6.99953  w3: 4.0  and the loss is  2.2105178e-07
variable is w1: 4.0  w2: -6.9995394  w3: 4.0  and the loss is  2.1217511e-07

如果w1、w2、w3都是tf.constant呢?毫無疑問,無法train

ValueError: No variables to optimize.

另一種獲得var_list的方式——tf.getCollection

#demo2.2  another way to collect var_list

label = tf.constant(1,dtype = tf.float32)
x = tf.placeholder(dtype = tf.float32)
w1 = tf.Variable(4,dtype=tf.float32)
with tf.name_scope(name='selected_variable_to_trian'):
    w2 = tf.Variable(4,dtype=tf.float32)
w3 = tf.constant(4,dtype=tf.float32)

y_predict = w1*x+w2*x+w3*x

#define losses and train
make_up_loss = (y_predict - label)**3
optimizer = tf.train.GradientDescentOptimizer(0.01)

output_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='selected_variable_to_trian')
train_step = optimizer.minimize(make_up_loss,var_list = output_vars)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for _ in range(3000):
        w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
        print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
        sess.run(train_step,{x:1})
variable is w1: 4.0  w2: -6.988893  w3: 4.0  and the loss is  1.3702081e-06
variable is w1: 4.0  w2: -6.988897  w3: 4.0  and the loss is  1.3687968e-06
variable is w1: 4.0  w2: -6.9889007  w3: 4.0  and the loss is  1.3673865e-06
variable is w1: 4.0  w2: -6.9889045  w3: 4.0  and the loss is  1.3659771e-06
variable is w1: 4.0  w2: -6.9889083  w3: 4.0  and the loss is  1.3645688e-06
variable is w1: 4.0  w2: -6.988912  w3: 4.0  and the loss is  1.3631613e-06
variable is w1: 4.0  w2: -6.988916  w3: 4.0  and the loss is  1.3617548e-06
variable is w1: 4.0  w2: -6.9889197  w3: 4.0  and the loss is  1.3603493e-06

TRAINABLE_VARIABLE=False

另一種限制variable被限制的方法,與上邊的方法原理相似,都和tf.GraphKeys.TRAINABLE_VARIABLE有關,只不過前一個是從裡邊挑出指定scope,這個從變數定義時就決定了不往裡插入這個變數。

#demo2.4  another way to avoid variable be train

label = tf.constant(1,dtype = tf.float32)
x = tf.placeholder(dtype = tf.float32)
w1 = tf.Variable(4,dtype=tf.float32,trainable=False)
w2 = tf.Variable(4,dtype=tf.float32)
w3 = tf.constant(4,dtype=tf.float32)

y_predict = w1*x+w2*x+w3*x

#define losses and train
make_up_loss = (y_predict - label)**3
optimizer = tf.train.GradientDescentOptimizer(0.01)

output_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
train_step = optimizer.minimize(make_up_loss,var_list = output_vars)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for _ in range(3000):
        w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
        print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
        sess.run(train_step,{x:1})

也就等於不指定var_list,直接train 

#demo2.3  another way to avoid variable be train

label = tf.constant(1,dtype = tf.float32)
x = tf.placeholder(dtype = tf.float32)
#w1 = tf.Variable(4,dtype=tf.float32)
w1 = tf.Variable(4,dtype=tf.float32,trainable=False)
with tf.name_scope(name='selected_variable_to_trian'):
    w2 = tf.Variable(4,dtype=tf.float32)
w3 = tf.constant(4,dtype=tf.float32)

y_predict = w1*x+w2*x+w3*x

#define losses and train
make_up_loss = (y_predict - label)**3
optimizer = tf.train.GradientDescentOptimizer(0.01)

train_step = optimizer.minimize(make_up_loss)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for _ in range(3000):
        w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
        print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
        sess.run(train_step,{x:1})

minimize()操作分解

其實minimize()操作也只是一個compute_gradients()和apply_gradients()的組合操作.

compute_gradients()用來計算梯度,opt.apply_gradients()用來更新引數。通過多個optimizer可以指定多個不同學習率的學習過程,針對不同的var_list分別進行gradient的計算和引數更新,可以用來遷移學習或者處理一些深層網路梯度更新不匹配的問題,暫不贅述。

#demo2.4  combine of ompute_gradients() and apply_gradients()

label = tf.constant(1,dtype = tf.float32)
x = tf.placeholder(dtype = tf.float32)
w1 = tf.Variable(4,dtype=tf.float32,trainable=False)
w2 = tf.Variable(4,dtype=tf.float32)
w3 = tf.Variable(4,dtype=tf.float32)

y_predict = w1*x+w2*x+w3*x

#define losses and train
make_up_loss = (y_predict - label)**3
optimizer = tf.train.GradientDescentOptimizer(0.01)

w2_gradient = optimizer.compute_gradients(loss = make_up_loss, var_list = w2)
train_step = optimizer.apply_gradients(grads_and_vars = (w2_gradient))

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for _ in range(300):
        w1_,w2_,w3_,loss_,w2_gradient_ = sess.run([w1,w2,w3,make_up_loss,w2_gradient],feed_dict={x:1})
        print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
        print('gradient:',w2_gradient_)
        sess.run(train_step,{x:1})

具體的learning rate、step、計算公式和手動梯度下降實現:

在預測中,x是關於y的變數,但是在train中,w是L的變數,x是不可能變化的。所以,知道為什麼weights叫Variable了吧(強行瞎解釋一發)

下面用tensorflow手動實現梯度下降:

為了方便寫公式,下邊的程式碼改了變數的命名,採用loss、prediction、gradient、weight、y、x等首字母表示,η表示學習率,w0、w1、w2等表示第幾次迭代時w的值,不是多個變數。

loss=(y-p)^2=(y-w*x)^2=(y^2-2*y*w*x+w^2*x^2)

dl/dw = 2*w*x^2-2*y*x

代入梯度下降公式w1=w0-η*dL/dw|w=w0

w1 = w0-η*dL/dw|w=w0

w2 = w1 - η*dL/dw|w=w1

w3 = w2 - η*dL/dw|w=w2

初始:y=3,x=1,w=2,l=1,dl/dw=-2,η=1

更新:w=4

更新:w=2

更新:w=4

所以,本例x=1,y=3,dl/dw巧合的等於2w-2y,也就是二倍的prediction和label的差距。learning rate=1會導致w圍繞正確的值來回徘徊,完全不收斂,這樣寫主要是方便演示計算。改小learning rate 並增加迴圈次數就能收斂了。

#demo4:manual gradient descent in tensorflow
#y label
y = tf.constant(3,dtype = tf.float32)
x = tf.placeholder(dtype = tf.float32)
w = tf.Variable(2,dtype=tf.float32)
#prediction
p = w*x

#define losses
l = tf.square(p - y)
g = tf.gradients(l, w)
learning_rate = tf.constant(1,dtype=tf.float32)
#learning_rate = tf.constant(0.11,dtype=tf.float32)
init = tf.global_variables_initializer()

#update
update = tf.assign(w, w - learning_rate * g[0])

with tf.Session() as sess:
    sess.run(init)
    print(sess.run([g,p,w], {x: 1}))
    for _ in range(5):
        w_,g_,l_ = sess.run([w,g,l],feed_dict={x:1})
        print('variable is w:',w_, ' g is ',g_,'  and the loss is ',l_)

        _ = sess.run(update,feed_dict={x:1})

結果:

learning rate=1

[[-2.0], 2.0, 2.0]
variable is w: 2.0  g is  [-2.0]   and the loss is  1.0
variable is w: 4.0  g is  [2.0]   and the loss is  1.0
variable is w: 2.0  g is  [-2.0]   and the loss is  1.0
variable is w: 4.0  g is  [2.0]   and the loss is  1.0
variable is w: 2.0  g is  [-2.0]   and the loss is  1.0

縮小learning rate

variable is w: 2.9964619  g is  [-0.007575512]   and the loss is  1.4347095e-05
variable is w: 2.996695  g is  [-0.0070762634]   and the loss is  1.2518376e-05
variable is w: 2.996913  g is  [-0.0066099167]   and the loss is  1.0922749e-05
variable is w: 2.9971166  g is  [-0.0061740875]   and the loss is  9.529839e-06
variable is w: 2.9973066  g is  [-0.0057668686]   and the loss is  8.314193e-06
variable is w: 2.9974842  g is  [-0.0053868294]   and the loss is  7.2544826e-06
variable is w: 2.9976501  g is  [-0.0050315857]   and the loss is  6.3292136e-06
variable is w: 2.997805  g is  [-0.004699707]   and the loss is  5.5218115e-06
variable is w: 2.9979498  g is  [-0.004389763]   and the loss is  4.8175043e-06
variable is w: 2.998085  g is  [-0.0041003227]   and the loss is  4.2031616e-06
variable is w: 2.9982114  g is  [-0.003829956]   and the loss is  3.6671408e-06
variable is w: 2.9983294  g is  [-0.0035772324]   and the loss is  3.1991478e-06

擴充套件:Momentum、Adagrad的自動和手動實現,這裡嫌太長,分開了

原始碼