1. 程式人生 > >日常填坑之TF模型載入“Key Variable_xxx not found in checkpoint”

日常填坑之TF模型載入“Key Variable_xxx not found in checkpoint”

儲存模型的時候一切正常,但是載入的時候就會出現“Key Variable_xxx not found in checkpoint”錯誤。首先要分析錯誤原因,一般情況下model.ckpt檔案肯定都有的,都是載入的時候出的問題。所以先把ckpt檔案中的變數打印出來看看。這裡有個前提條件,定義變數的時候需要指定name引數,不然打印出來的都是“Variable_xxx:0”之類的!

import os
from tensorflow.python import pywrap_tensorflow

current_path = os.getcwd()
model_dir = os.path.join(current_path, 'model'
) checkpoint_path = os.path.join(model_dir,'embedding.ckpt-0') # 儲存的ckpt檔名,不一定是這個 # Read data from checkpoint file reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path) var_to_shape_map = reader.get_variable_to_shape_map() # Print tensor name and values for key in var_to_shape_map: print("tensor_name: "
, key) # print(reader.get_tensor(key)) # 列印變數的值,對我們查詢問題沒啥影響,打印出來反而影響找問題

我的輸出:

tensor_name:  w_1_1/Adam_1
tensor_name:  w_2/Adam_1
tensor_name:  b_2
tensor_name:  w_1_1
tensor_name:  w_out/Adam_1
tensor_name:  b_1_1/Adam_1
tensor_name:  w_out
tensor_name:  w_1
tensor_name:  b_out
tensor_name:  b_2/Adam
tensor_name: b_1 tensor_name: b_out/Adam_1 tensor_name: b_1_1/Adam tensor_name: w_1_1/Adam tensor_name: b_1_1 tensor_name: w_2/Adam tensor_name: w_2 tensor_name: w_out/Adam tensor_name: beta1_power tensor_name: b_out/Adam tensor_name: b_2/Adam_1 tensor_name: beta2_power

這就很明顯了,我的網路裡只有”b_1,b_2,w_1,w_2”這種變數,由於使用了tf.train.AdamOptimizer()來更新梯度,所以在儲存檢查點的時候如果不指定則是全域性儲存,把優化的變數“w_out/Adam”這種命名規則的變數也一併儲存了,自然在恢復的時候就會出現找不到XX變數。解決辦法,在宣告 saver = tf.train.Saver()的時候帶上引數,即需要儲存的變數

def ann_net(w_alpha=0.01, b_alpha=0.1):
    # 隱藏層_1
    w_1 = tf.Variable(w_alpha * tf.random_normal(shape=(input_size, hidden1_size)), name='w_1')
    b_1 = tf.Variable(b_alpha * tf.random_normal(shape=[hidden1_size]),name='b_1')
    hidden1_output = tf.nn.tanh(tf.add(tf.matmul(X, w_1), b_1))
    hidden1_output = tf.nn.dropout(hidden1_output, keep_prob)

    # 隱藏層_2
    shp1 = hidden1_output.get_shape()
    w_2 = tf.Variable(w_alpha * tf.random_normal(shape=(shp1[1].value, hidden2_size)), name='w_2')
    b_2 = tf.Variable(b_alpha * tf.random_normal(shape=[hidden2_size]),name='b_2')
    hidden2_output = tf.nn.tanh(tf.add(tf.matmul(hidden1_output, w_2), b_2))
    hidden2_output = tf.nn.dropout(hidden2_output, keep_prob)

    # 輸出層
    shp2 = hidden2_output.get_shape()
    w_output = tf.Variable(w_alpha * tf.random_normal(shape=(shp2[1].value, embeding_size)), name='w_out')
    b_output = tf.Variable(b_alpha * tf.random_normal(shape=[embeding_size]),name='b_out')
    output = tf.add(tf.matmul(hidden2_output, w_output), b_output)

    variables_dict = {'b_2': b_2, 'w_out': w_output, 'w_1': w_1, 'b_out': b_output, 'b_1': b_1, 'w_2': w_2}
    return output,variables_dict

在train()函式裡,使用variables_dict初始化saver

with tf.device('/cpu:0'):
    saver = tf.train.Saver(var_dict)
    with tf.Session(config=tf.ConfigProto(device_count={'cpu': 0})) as sess:
        sess.run(tf.global_variables_initializer())
        step = 0

        ckpt = tf.train.get_checkpoint_state('model/')
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
            step = int(ckpt.model_checkpoint_path.rsplit('-',1)[1])
            print("Model restored.")
    # 訓練程式碼
    # ... ...
    saver.save(sess, 'model/embedding.model', global_step=step)

如果是從網上down的模型比如vgg-16之類的,只想載入前面的幾層,而且用自己定義的變數,方法一樣,指定一個變數列表或者字典,傳給tf.train.Saver()。
如果是LSTM,道理也一樣,不過系統儲存的時候有tf自己的規則,LSTM預設的variable_scope叫做“bidirectional_rnn”如果沒額外操作過的話變數前會自動帶上這個名字,所以儲存的模型裡的名字就類似於下面這樣:

tensor_name:  train/train_1/fc_b/Adam
tensor_name:  train_1/fc_b
tensor_name:  train/fc_b
tensor_name:  train/train/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam
tensor_name:  train/bidirectional_rnn/fw/basic_lstm_cell/kernel
tensor_name:  train/train/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam
tensor_name:  train/beta2_power
tensor_name:  train/train/fc_w/Adam
tensor_name:  train_1/beta1_power
tensor_name:  train/train/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam_1
tensor_name:  train/train/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam_1
tensor_name:  train/beta1_power
tensor_name:  train/train_1/fc_w/Adam_1
tensor_name:  train/train/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam_1
tensor_name:  train_1/beta2_power
tensor_name:  train/train/fc_w/Adam_1
tensor_name:  train/bidirectional_rnn/bw/basic_lstm_cell/kernel
tensor_name:  train/train/fc_b/Adam
tensor_name:  train/bidirectional_rnn/bw/basic_lstm_cell/bias
tensor_name:  train/fc_w
tensor_name:  train_1/fc_w
tensor_name:  train/bidirectional_rnn/fw/basic_lstm_cell/bias
tensor_name:  train/train/fc_b/Adam_1
tensor_name:  train/train/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam_1
tensor_name:  train/train/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam
tensor_name:  train/train/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam
tensor_name:  train/train_1/fc_b/Adam_1
tensor_name:  train/train_1/fc_w/Adam

前面的”train”是我新增的variable_scope,所以恢復的時候可以這樣:

include = ['train/fc_b', 'train/fc_w',
           'train/bidirectional_rnn/bw/basic_lstm_cell/bias',
           'train/bidirectional_rnn/bw/basic_lstm_cell/kernel',
           'train/bidirectional_rnn/fw/basic_lstm_cell/bias',
           'train/bidirectional_rnn/fw/basic_lstm_cell/kernel']
variables_to_restore = tf.contrib.slim.get_variables_to_restore(include=include)
saver = tf.train.Saver(variables_to_restore)
with tf.Session(config=tf.ConfigProto(device_count={'cpu': 0})) as sess:
    sess.run(tf.global_variables_initializer())
    # ... ...