1. 程式人生 > >7 Recursive AutoEncoder結構遞迴自編碼器(tensorflow)不能呼叫GPU進行計算的問題(非機器配置,而是網路結構的問題)

7 Recursive AutoEncoder結構遞迴自編碼器(tensorflow)不能呼叫GPU進行計算的問題(非機器配置,而是網路結構的問題)

一、原始碼下載

程式碼最初來源於Github:https://github.com/vijayvee/Recursive-neural-networks-TensorFlow,程式碼介紹如下:“This repository contains the implementation of a single hidden layer Recursive Neural Network.Implemented in python using TensorFlow. Used the trained models for the task of Positive/Negative sentiment analysis. This code is the solution for the third programming assignment from "CS224d: Deep learning for Natural Language Processing", Stanford University.”

由於其執行在python2版本,我對其進行了修改,以及對相關樹進行了視覺化。我修改後的可執行程式碼下載連結是(連同要處理的電影評論資料):

https://pan.baidu.com/s/1bJTulQPs_h25sdLlCcTqDA

執行環境是:windows10、anaconda上建立的tensorflow1.8環境、python3.6版本。

二、問題描述

在程式中使用log_device_placement=True,可以看到:

運算裝置的選擇是GPU,只有部分save/restore操作是CPU。

但是實際執行的時候,GPU Load為0。

我的電腦已經是GPU安裝完整的,執行其它的神經網路程式,能夠看到GPU Load的變化。

三、解決方案

提交到付費解決方案平臺昂鈦客https://www.angtk.com/

https://www.angtk.com/question/354

沒有收到問題的解決辦法。

四、部分發現 

第一,RvNN網路是隨著語料庫中的句子(訓練樣本)長度的變化而變化。

第二,由於第一方面的特性,其必須設定一個reset_after。即每訓練reset_after個句子,就需要儲存模型,接著重新定義一個新的Graph,然後將已經儲存模型中的

權值矩陣恢復到新的Graph中,繼續進行訓練。

我在我修改的程式碼中加入了儲存計算圖的操作,可以用tensorboard檢視。觀察發現,每訓練reset_after個句子,就會生成reset_after個loss層(每個句子對應一個loss層),計算圖會越來越大。

這也是為什麼要重新定義Graph,然後繼續訓練。

(結構遞迴神經網路RvNN的核心就是一個前向層和一個重構層,這兩個層不斷應用於兩個子節點,然後得出父節點。所以,這兩個層的引數是被不斷訓練的)

五、解決方案

5.1:收集其它RvNN的實現

大部分的實現都是Richard Socher寫的Matlab程式以及對應的Python版本,都包括了損失值的計算和梯度值的計算。我需要找到的是tensorflow版本上的實現。

參考網址:https://stats.stackexchange.com/questions/243221/recursive-neural-network-implementation-in-tensorflow裡面提供了一些實現的方法

5.1.1  TensorFlow Fold

https://github.com/tensorflow/fold

TensorFlow Fold is a library for creating TensorFlow models that consume structured data, where the structure of the computation graph depends on the structure of the input data. For example, this model implements TreeLSTMs for sentiment analysis on parse trees of arbitrary shape/size/depth.

Fold implements dynamic batching. Batches of arbitrarily shaped computation graphs are transformed to produce a static computation graph. This graph has the same structure regardless of what input it receives, and can be executed efficiently by TensorFlow.

This animation shows a recursive neural network run with dynamic batching. Operations of the same type appearing at the same depth in the computation graph (indicated by color in the animiation) are batched together regardless of whether or not they appear in the same parse tree. The Embed operation converts words to vector representations. The fully connected (FC) operation combines word vectors to form vector representations of phrases. The output of the network is a vector representation of an entire sentence. Although only a single parse tree of a sentence is shown, the same network can run, and batch together operations, over multiple parse trees of arbitrary shapes and sizes. The TensorFlow concat, while_loop, and gather ops are created once, prior to variable initialization, by Loom, the low-level API for TensorFlow Fold.

(裡面提到了三個運算,concat,while和gather)

5.1.2 Tensorflow implementation of Recursive Neural Networks using LSTM units

下載地址是:https://github.com/sapruash/RecursiveNN

Tensorflow implementation of Recursive Neural Networks using LSTM units as described in "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks" by Kai Sheng Tai, Richard Socher, and Christopher D. Manning.

(這個是斯坦福Richard Socher教授的文章,他是RvNN的提出者,他在博士論文中闡述了這個網路結構,也因此成為了深度學習大神之一)

5.1.3 Recursive (not Recurrent!) Neural Networks in TensorFlow

KDnuggets

文章地址:https://www.kdnuggets.com/2016/06/recursive-neural-networks-tensorflow.html

程式碼下載地址(需要FQ):https://gist.github.com/anj1/504768e05fda49a6e3338e798ae1cddd

我簡單的從py2轉到py3上以後,執行,發現Gpu load已經上來了,不再是0.

所以,我懷疑本文沒有呼叫GPU的程式碼是因為網路結構定義中使用了dict的緣故。在字典中放入tensor向量,導致不被GPU運算支援。現在我對程式碼進行重構。


 RvNN的兩個缺點

The advantage of TreeNets is that they can be very powerful in learning hierarchical, tree-like structure. The disadvantages are, firstly, that the tree structure of every input sample must be known at training time. We will represent the tree structure like this (lisp-like notation):

(S (NP that movie) (VP was) (ADJP cool))

In each sub-expression, the type of the sub-expression must be given – in this case, we are parsing a sentence, and the type of the sub-expression is simply the part-of-speech (POS) tag. You can see that expressions with three elements (one head and two tail elements) correspond to binary operations, whereas those with four elements (one head and three tail elements) correspond to trinary operations, etc.

The second disadvantage of TreeNets is that training is hard because the tree structure changes for each training sample and it’s not easy to map training to mini-batches and so on.

6 除錯解決問題。

6.1 除錯Recursive (not Recurrent!) Neural Networks in TensorFlow

原始碼

import types
import tensorflow as tf 
import numpy as np

# Expressions are represented as lists of lists,
# in lisp style -- the symbol name is the head (first element)
# of the list, and the arguments follow.

# add an expression to an expression list, recursively if necessary.
def add_expr_to_list(exprlist, expr):
    # if expr is a atomic type
    if isinstance(expr, list):
        # Now for rest of expression
        for e in expr[1:]:
            # Add to list if necessary
            if not (e in exprlist):
                add_expr_to_list(exprlist, e)
    # Add index in list.
    exprlist.append(expr)

def expand_subexprs(exprlist):
    new_exprlist = []
    orig_indices = []
    for e in exprlist:
        add_expr_to_list(new_exprlist, e)
        orig_indices.append(len(new_exprlist)-1)
    return new_exprlist, orig_indices

def compile_expr(exprlist, expr):
    # start new list starting with head
    new_expr = [expr[0]]
    for e in expr[1:]:
        new_expr.append(exprlist.index(e))
    return new_expr

def compile_expr_list(exprlist):
    new_exprlist = []
    for e in exprlist:
        if isinstance(e, list):
            new_expr = compile_expr(exprlist, e)
        else:
            new_expr = e
        new_exprlist.append(new_expr)
    return new_exprlist

def expand_and_compile(exprlist):
    l, orig_indices = expand_subexprs(exprlist)
    return compile_expr_list(l), orig_indices

def new_weight(N1,N2):
    return tf.Variable(tf.random_normal([N1,N2]))
def new_bias(N_hidden):
    return tf.Variable(tf.random_normal([N_hidden]))

def build_weights(exprlist,N_hidden,inp_vec_len,out_vec_len):
    W = dict()  # dict of weights corresponding to each operation
    b = dict()  # dict of biases corresponding to each operation
    W['input']  = new_weight(inp_vec_len, N_hidden)
    W['output'] = new_weight(N_hidden, out_vec_len)
    for expr in exprlist:
        if isinstance(expr, list):
            idx = expr[0]
            if not (idx in W):
                W[idx] = [new_weight(N_hidden,N_hidden) for i in expr[1:]]
                b[idx] = new_bias(N_hidden)
    return (W,b)

def build_rnn_graph(exprlist,W,b,inp_vec_len):
    # with W built up, create list of variables
    # intermediate variables
    in_vars = [e for e in exprlist if not isinstance(e,list)]
    N_input = len(in_vars)
    inp_tensor = tf.placeholder(tf.float32, (N_input,  inp_vec_len), name='input1')
    V = []      # list of variables corresponding to each expr in exprlist
    for expr in exprlist:
        if isinstance(expr, list):
            # intermediate variables
            idx = expr[0]
            # add bias
            new_var = b[idx]
            # add input variables * weights
            for i in range(1,len(expr)):
                new_var = tf.add(new_var, tf.matmul(V[expr[i]], W[idx][i-1]))
            new_var = tf.nn.relu(new_var)
        else:
            # base (input) variables
            # TODO : variable or placeholder?
            i = in_vars.index(expr)
            i_v = tf.slice(inp_tensor, [i,0], [1,-1])
            new_var = tf.nn.relu(tf.matmul(i_v,W['input']))
        V.append(new_var)
    return (inp_tensor,V)

# take a compiled expression list and build its RNN graph
def complete_rnn_graph(W,V,orig_indices,out_vec_len):
    # we store our matrices in a dict;
    # the dict format is as follows:
    # 'op':[mat_arg1,mat_arg2,...]
    # e.g. unary operations:  '-':[mat_arg1]
    #      binary operations: '+':[mat_arg1,mat_arg2]
    # create a list of our base variables
    N_output = len(orig_indices)
    out_tensor = tf.placeholder(tf.float32, (N_output, out_vec_len), name='output1')

    # output variables
    ce = tf.reduce_sum(tf.zeros((1,1)))
    for idx in orig_indices:
        o = tf.nn.softmax(tf.matmul(V[idx], W['output']))
        t = tf.slice(out_tensor, [idx,0], [1,-1])
        ce = tf.add(ce, -tf.reduce_sum(t * tf.log(o)), name='loss')
    # TODO: output variables
    # return weights and variables and final loss
    return (out_tensor, ce)


# from subexpr_lists import *
a = [ 1, ['+',1,1], ['*',1,1], ['*',['+',1,1],['+',1,1]], ['+',['+',1,1],['+',1,1]], ['+',['+',1,1],1 ], ['+',1,['+',1,1]]]
# generate training graph
l,o=expand_and_compile(a)
W,b = build_weights(l,10,1,2)
i_t,V = build_rnn_graph(l,W,b,1)
o_t,ce = complete_rnn_graph(W,V,o,2)
# generate testing graph
a = [ ['+',['+',['+',1,1],['+',['+',1,1],['+',1,1]]],1] ]  # 7
l_tst,o_tst=expand_and_compile(a)
i_t_tst,V_tst = build_rnn_graph(l_tst,W,b,1)

out_batch = np.transpose(np.array([[1,0,1,0,0,1,1],[0,1,0,1,1,0,0]]))
print (ce)
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(ce)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(5000):
    sess.run(train_step, feed_dict={i_t:np.array([[1]]),o_t:out_batch})
print (l)
print (l_tst)
print (sess.run(tf.nn.softmax(tf.matmul(V[1], W['output'])), feed_dict={i_t:np.array([[1]])}))
print (sess.run(tf.nn.softmax(tf.matmul(V[-1], W['output'])), feed_dict={i_t:np.array([[1]])}))
print (sess.run(tf.nn.softmax(tf.matmul(V_tst[-2], W['output'])), feed_dict={i_t_tst:np.array([[1]])}))
print (sess.run(tf.nn.softmax(tf.matmul(V_tst[-1], W['output'])), feed_dict={i_t_tst:np.array([[1]])}))

執行程式碼,能夠看到GPU_load不為0。

仿造RvNN的方式,(即由於網路結構隨著語料庫中句子的變化而變化,每一次都是新建圖,並且載入儲存的模型)修改程式碼如下,

import types
import tensorflow as tf 
import numpy as np
import os

# Expressions are represented as lists of lists,
# in lisp style -- the symbol name is the head (first element)
# of the list, and the arguments follow.

# add an expression to an expression list, recursively if necessary.
def add_expr_to_list(exprlist, expr):
    # if expr is a atomic type
    if isinstance(expr, list):
        # Now for rest of expression
        for e in expr[1:]:
            # Add to list if necessary
            if not (e in exprlist):
                add_expr_to_list(exprlist, e)
    # Add index in list.
    exprlist.append(expr)

def expand_subexprs(exprlist):
    new_exprlist = []
    orig_indices = []
    for e in exprlist:
        add_expr_to_list(new_exprlist, e)
        orig_indices.append(len(new_exprlist)-1)
    return new_exprlist, orig_indices

def compile_expr(exprlist, expr):
    # start new list starting with head
    new_expr = [expr[0]]
    for e in expr[1:]:
        new_expr.append(exprlist.index(e))
    return new_expr

def compile_expr_list(exprlist):
    new_exprlist = []
    for e in exprlist:
        if isinstance(e, list):
            new_expr = compile_expr(exprlist, e)
        else:
            new_expr = e
        new_exprlist.append(new_expr)
    return new_exprlist

def expand_and_compile(exprlist):
    l, orig_indices = expand_subexprs(exprlist)
    return compile_expr_list(l), orig_indices

def new_weight(N1,N2):
    return tf.Variable(tf.random_normal([N1,N2]))
def new_bias(N_hidden):
    return tf.Variable(tf.random_normal([N_hidden]))

def build_weights(exprlist,N_hidden,inp_vec_len,out_vec_len):
    W = dict()  # dict of weights corresponding to each operation
    b = dict()  # dict of biases corresponding to each operation
    W['input']  = new_weight(inp_vec_len, N_hidden)
    W['output'] = new_weight(N_hidden, out_vec_len)
    for expr in exprlist:
        if isinstance(expr, list):
            idx = expr[0]
            if not (idx in W):
                W[idx] = [new_weight(N_hidden,N_hidden) for i in expr[1:]]
                b[idx] = new_bias(N_hidden)
    return (W,b)

def build_rnn_graph(exprlist,W,b,inp_vec_len):
    # with W built up, create list of variables
    # intermediate variables
    in_vars = [e for e in exprlist if not isinstance(e,list)]
    N_input = len(in_vars)
    inp_tensor = tf.placeholder(tf.float32, (N_input,  inp_vec_len), name='input1')
    V = []      # list of variables corresponding to each expr in exprlist
    for expr in exprlist:
        if isinstance(expr, list):
            # intermediate variables
            idx = expr[0]
            # add bias
            new_var = b[idx]
            # add input variables * weights
            for i in range(1,len(expr)):
                new_var = tf.add(new_var, tf.matmul(V[expr[i]], W[idx][i-1]))
            new_var = tf.nn.relu(new_var)
        else:
            # base (input) variables
            # TODO : variable or placeholder?
            i = in_vars.index(expr)
            i_v = tf.slice(inp_tensor, [i,0], [1,-1])
            new_var = tf.nn.relu(tf.matmul(i_v,W['input']))
        V.append(new_var)
    return (inp_tensor,V)

# take a compiled expression list and build its RNN graph
def complete_rnn_graph(W,V,orig_indices,out_vec_len):
    # we store our matrices in a dict;
    # the dict format is as follows:
    # 'op':[mat_arg1,mat_arg2,...]
    # e.g. unary operations:  '-':[mat_arg1]
    #      binary operations: '+':[mat_arg1,mat_arg2]
    # create a list of our base variables
    N_output = len(orig_indices)
    out_tensor = tf.placeholder(tf.float32, (N_output, out_vec_len), name='output1')

    # output variables
    ce = tf.reduce_sum(tf.zeros((1,1)))
    for idx in orig_indices:
        o = tf.nn.softmax(tf.matmul(V[idx], W['output']))
        t = tf.slice(out_tensor, [idx,0], [1,-1])
        ce = tf.add(ce, -tf.reduce_sum(t * tf.log(o)), name='loss')
    # TODO: output variables
    # return weights and variables and final loss
    return (out_tensor, ce)


# from subexpr_lists import *
a = [ 1, ['+',1,1], ['*',1,1], ['*',['+',1,1],['+',1,1]], ['+',['+',1,1],['+',1,1]], ['+',['+',1,1],1 ], ['+',1,['+',1,1]]]
# generate training graph
l,o=expand_and_compile(a)

new_model=True
RESET_AFTER=50
a = [ 1, ['+',1,1], ['*',1,1], ['*',['+',1,1],['+',1,1]], ['+',['+',1,1],['+',1,1]], ['+',['+',1,1],1 ], ['+',1,['+',1,1]]]
        # generate training graph
out_batch = np.transpose(np.array([[1,0,1,0,0,1,1],[0,1,0,1,1,0,0]]))
l,o=expand_and_compile(a)
for i in range(5000):
    with tf.Graph().as_default(), tf.Session() as sess:
        W,b = build_weights(l,10,1,2)
        i_t,V = build_rnn_graph(l,W,b,1)
        o_t,ce = complete_rnn_graph(W,V,o,2)
        train_step = tf.train.GradientDescentOptimizer(0.001).minimize(ce)
        if new_model:
            init = tf.initialize_all_variables()
            sess.run(init)
            new_model=False #xiaojie新增
        else:
            saver = tf.train.Saver()
            saver.restore(sess, './weights/xiaojie.temp')
        sess.run(train_step, feed_dict={i_t:np.array([[1]]),o_t:out_batch})
#        step=0
#        for step in range(1000):
#            if step > 900:
#                break
#            sess.run(train_step, feed_dict={i_t:np.array([[1]]),o_t:out_batch})
#            step +=1
        saver = tf.train.Saver()
        if not os.path.exists("./weights"):
            os.makedirs("./weights")
        saver.save(sess, './weights/xiaojie.temp')
#for i in range(5000):
#    sess.run(train_step, feed_dict={i_t:np.array([[1]]),o_t:out_batch})
# generate testing graph
a = [ ['+',['+',['+',1,1],['+',['+',1,1],['+',1,1]]],1] ]  # 7
l_tst,o_tst=expand_and_compile(a)
i_t_tst,V_tst = build_rnn_graph(l_tst,W,b,1)

out_batch = np.transpose(np.array([[1,0,1,0,0,1,1],[0,1,0,1,1,0,0]]))

print (l_tst)
print (sess.run(tf.nn.softmax(tf.matmul(V[1], W['output'])), feed_dict={i_t:np.array([[1]])}))
print (sess.run(tf.nn.softmax(tf.matmul(V[-1], W['output'])), feed_dict={i_t:np.array([[1]])}))
print (sess.run(tf.nn.softmax(tf.matmul(V_tst[-2], W['output'])), feed_dict={i_t_tst:np.array([[1]])}))
print (sess.run(tf.nn.softmax(tf.matmul(V_tst[-1], W['output'])), feed_dict={i_t_tst:np.array([[1]])}))

會發現GPU_load為0!

此時,對程式碼進行修改:

sess.run(train_step, feed_dict={i_t:np.array([[1]]),o_t:out_batch})

改為:

        step=0
        for step in range(1000):
            if step > 900:
                break
            sess.run(train_step, feed_dict={i_t:np.array([[1]]),o_t:out_batch})
            step +=1

此時再執行,GPU_load不為0了!

說明在相同的網路結構上執行多次,才會發揮GPU的計算能力。