迴圈神經網路系列（七）Tensorflow中ConvLSTMCell

阿新 • • 發佈：2018-12-14

前面一文我們簡單得介紹了ConvLSTM得原理和應用場景，現在來看看在Tensorflow它具體是如何實現得。值得一說得是Tensorflow在實現這個演算法得時候並沒有使用peepholes，即下面的紅色部分，而是基於原始的LSTM網路結構改變的。不過在最後，我也會給出一個仿照Tensorflow實現的基於peepholes的ConvLSTM版本。

1.用法

在接受具體用法前，先來大致解釋一下引數。由於該演算法是CNN與LSTM的一個結合，所以在說到對其中各個引數理解的時候，我會將其類比於CNN和LSTM中對應的引數。

def __init__(self,
               conv_ndims,
               input_shape,
               output_channels,
               kernel_shape,
               use_bias=True,
               skip_connection=False,
               forget_bias=1.0,
               initializers=None,
               name="conv_lstm_cell"):

引數：
conv_ndims:卷積維度，對於平面卷積來說都為2，例如圖片的卷積處理；
input_shape:輸入維度（除去batch_size)，例如當conv_ndims=2時，輸入維度就應該為[weight,high,channel]，最簡單的就是將它想象成一個圖片；
output_channels:最終輸出時的深度；就像是在conv2d中卷積核的深度一樣，它決定了最終提取特徵後的影象的深度
kernel_shape:卷積核的平面大小例如[3,3]；熟悉conv2d的都知道卷積核的維度應該是4，這兒怎麼才2，原因就是剩下的兩個維度直接通過計算就可以得到，這部分在ConvLSTM的內部實現中完成
use_bias:

這個很好理解，是否要使用偏置
剩下的引數保持預設就好。

1.1 單個單元cell.call()

import tensorflow.contrib as contrib
import tensorflow as tf

inputs = tf.placeholder(dtype=tf.float32, shape=[64, 10, 10, 28])  # [batch_size,width,high,channeals] 
cell = contrib.rnn.ConvLSTMCell(conv_ndims=2, input_shape=[10, 10, 28], output_channels= 
6, kernel_shape=[3, 3])
initial_state = cell.zero_state(batch_size=64, dtype=tf.float32)
output, final_state=cell.call(inputs=inputs,state=initial_state)
print(output)
print(final_state)

>>
Tensor("mul_2:0", shape=(64, 10, 10, 6), dtype=float32)
LSTMStateTuple(c=<tf.Tensor 'add_2:0' shape=(64, 10, 10, 6) dtype=float32>, h=<tf.Tensor 'mul_2:0' shape=(64, 10, 10, 6) dtype=float32>)

從圖中我們可以看到每個輸出結果的維度是由哪些引數所確定的。

1.2 按時間維度展開dynamic_rnn()

inputs = tf.placeholder(dtype=tf.float32, shape=[64, 100, 10, 10, 28])  # [batch_size,time_setp,width,high,channeals] 5D
cell = contrib.rnn.ConvLSTMCell(conv_ndims=2, input_shape=[10, 10, 28], output_channels=6, kernel_shape=[3, 3])
initial_state = cell.zero_state(batch_size=64, dtype=tf.float32)
output, final_state = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32, time_major=False, initial_state=initial_state)
print(output)
print(final_state)

>>
Tensor("rnn/transpose_1:0", shape=(64, 100, 10, 10, 6), dtype=float32)
LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_3:0' shape=(64, 10, 10, 6) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_4:0' shape=(64, 10, 10, 6) dtype=float32>)

同其它RNN一樣，在使用dynamic_rnn時inputs接受兩種形式的輸入，並且通過引數time_major來控制。如果inputs的shape=[time_setp,batch_size,width,high,channeals]，則必須有time_major=True。同時，從上面的程式碼可知，在時間軸上的展開有100個，所以最後輸出shape=(64, 100, 10, 10, 6)表示的含義是100個單元，每個單元對應輸出部分的大小為[64,10,10,6]。

2.Tensorflow中的實現細節

接下來，我們就來大致看看ConvLSTM的內部實現部分。同RNN、LSTM一樣，其核心部分都是在於實現call()這個類方法，然後通過呼叫call()來完成一次前向傳播的任務。而對應call()這個方法來說，其核心部分就是裡面的卷積操作(由_conv()這個函式來實現)，當計算得到卷積後的結果後，剩下的就是各種啟用函式的線性組合了,如下：

  def call(self, inputs, state, scope=None):
    cell, hidden = state
    new_hidden = _conv([inputs, hidden],
                       self._kernel_shape,
                       4*self._output_channels,
                       self._use_bias)
    input_gate, new_input, forget_gate, output_gate = gates
    new_cell = math_ops.sigmoid(forget_gate + self._forget_bias) * cell
    new_cell += math_ops.sigmoid(input_gate) * math_ops.tanh(new_input)
    output = math_ops.tanh(new_cell) * math_ops.sigmoid(output_gate)
    new_state = rnn_cell_impl.LSTMStateTuple(new_cell, output)
    
    return output, new_state

所以此時的關注的重點就轉移到了_conv()這個函式上來了。同時我們知道，在寫卷積網路的時候，我們需要指定每一個卷積核的shape，可是在ConvLSTM中似乎並沒有完全指定，這到底怎麼回事呢？

由前面的的計算公式可知一共應該有8個卷積核，分別是 $w_{xi},w_{hi},w_{xf},w_{hf},w_{xc},w_{hc},w_{xo},w_{ho}$ 。假設我們現在輸入的形狀input:[1,10,10,28],output_channels=2,kernel_size=[3,3],則按照一般思路我們應該是進行如下計算（進行8次卷積操作）。

有沒有發現這樣計算太麻煩了，我在畫圖的時候都覺得麻煩。不過Tensorflow的實現從來沒讓我們失望過，其在實現的時候同樣採用了先疊加，然後進行卷積的原則，大大減少了計算複雜程度：

new_hidden = _conv([inputs, hidden],
                   self._kernel_shape,
                   4*self._output_channels,
                   self._use_bias)
---------------------------------------------------------------------------

def _conv(args, filter_size, num_features, bias, bias_start=0.0):
    total_arg_size_depth = 0
    shapes = [a.get_shape().as_list() for a in args]    
    shape_length = len(shapes[0])# 得到args[0]，也就是inputs的維度
    total_arg_size_depth += shape[-1]
    kernel = vs.get_variable(
      "kernel",
      filter_size + [total_arg_size_depth, num_features],
      dtype=dtype)
      
    res = conv_op(array_ops.concat(axis=shape_length-1, values=args),
              kernel,
              strides,
              padding='SAME')

如上程式碼所示：
第3行對應的即使計算圖p0048中的8；
第11行對應的就是計算圖p0048中的30；
第17行中的concat對應的就是將圖p0048中的 $x,H$ 堆疊起來；

從以上程式碼我們也確實可以發現，Tensorflow在實現ConvLSTM時，確實基於的是原始的LSTM。

3.Tensorflow實現基於’peepholes LSTM’的ConvLSTM

由於contrib.rnn.ConvLSTMCell中對於ConvLSTMCell的實現本沒有基於原作者的所引用的帶有 "peepholes connection"的LSTM。因此，這裡就照著葫蘆畫瓢，直接在原來的contrib.rnn.ConvLSTMCell的call()實現中上添加了peepholes這一步。

新增的程式碼為：

        w_ci = vs.get_variable(
            "w_ci", cell.shape, inputs.dtype)
        w_cf = vs.get_variable(
            "w_cf", cell.shape, inputs.dtype)
        w_co = vs.get_variable(
            "w_co", cell.shape, inputs.dtype)

        new_cell = math_ops.sigmoid(forget_gate + self._forget_bias + w_cf * cell) * cell
        new_cell += math_ops.sigmoid(input_gate + w_ci * cell) * math_ops.tanh(new_input)
        output = math_ops.tanh(new_cell) * math_ops.sigmoid(output_gate + w_co * new_cell)

引用時，將 ConvLSTM中的BasicConvLSTM匯入即可：

from ConvLSTM import BasicConvLSTM

用法同ConvLSTMCell一模一樣！

原始碼戳此處

迴圈神經網路系列（七）Tensorflow中ConvLSTMCell

迴圈神經網路系列（七）Tensorflow中ConvLSTMCell

迴圈神經網路系列（三）Tensorflow中MultiRNNCell

迴圈神經網路系列（二）Tensorflow中dynamic_rnn

迴圈神經網路系列（一）Tensorflow中BasicRNNCell

迴圈神經網路系列（五）Tensorflow中BasicLSTMCell

迴圈神經網路系列（四）基於LSTM的MNIST手寫體識別

迴圈神經網路系列（六）基於LSTM的唐詩生成

對抗神經網路學習（七）——SRGAN生成超解析度影像(tensorflow實現)

MATLAB神經網路程式設計（七）——BP神經網路的實現

TensorFlow從入門到理解（四）：你的第一個迴圈神經網路RNN（分類例子）

TensorFlow從入門到理解（五）：你的第一個迴圈神經網路RNN（迴歸例子）

機器學習與深度學習系列連載：第二部分深度學習（十六）迴圈神經網路 4（BiDirectional RNN， Highway network， Grid-LSTM）

機器學習與深度學習系列連載：第二部分深度學習（十五）迴圈神經網路 3（Gated RNN - GRU）

機器學習與深度學習系列連載：第二部分深度學習（十四）迴圈神經網路 2（Gated RNN - LSTM ）

機器學習與深度學習系列連載：第二部分深度學習（十三）迴圈神經網路 1（Recurre Neural Network 基本概念）

機器學習與深度學習系列連載：第二部分深度學習（十四）迴圈神經網路 2（Gated RNN

對抗神經網路學習（四）——WGAN+爬蟲生成皮卡丘影象(tensorflow實現)

對抗神經網路學習（十）——attentiveGAN實現影像去雨滴的過程(tensorflow實現)

對抗神經網路學習（九）——CartoonGAN+爬蟲生成《言葉之庭》風格的影像(tensorflow實現)

對抗神經網路學習（八）——DeblurGAN實現運動影象的去模糊化(tensorflow實現)

迴圈神經網路系列（七）Tensorflow中ConvLSTMCell

相關推薦