從rnn到lstm，再到seq2seq（二）

阿新 • • 發佈：2017-05-21

app 感受 ima bsp expand images cat https github

技術分享

從圖上可以看出來，decode的過程其實都是從encode的最後一個隱層開始的，如果encode輸入過長的話，會丟失很多信息，所以設計了attation機制。

attation機制的decode的過程和原來的最大的區別就是，它輸出的不只是基於本時刻的h，而是基於本時刻的h和C的concat矩陣。

那麽C是什麽，C就是encode的h的聯合（見最後一張圖的公式），含義非常明顯了，就是我在decode的時候，不但考慮我現在decode的隱層的情況，同時也考慮到encode的隱層的情況，那麽關鍵是encode的隱層那麽多，你該怎麽考慮了，這就是attation矩陣的計算方式。。目前的計算方式是，這個時刻decode的隱層和encode的所有隱層做個對應，最後一張圖非常明白

技術分享

如果你還沒有理解，看這個公式，輸入的d‘t就是我上面說的C，把這個和dt concat就是本時刻輸出的隱層

技術分享

其實實現起來不復雜，就是在decode的時候，隱層和encode的隱層對應一下，然後concat一下：

下面這個代碼是在github上找的，兩個隱層對應的方式可能跟上面說的不一樣，但是原理都差不多，看這個代碼感受一下這個流程。

s = self.encoder.zero_state(self.batch_size, tf.float32)
        encoder_hs = []
        with tf.variable_scope( 
"encoder"):
            for t in xrange(self.max_size):
                if t > 0: tf.get_variable_scope().reuse_variables()
                x = tf.squeeze(source_xs[t], [1])
                x = tf.matmul(x, self.s_proj_W) + self.s_proj_b
                h, s = self.encoder(x, s)
                encoder_hs.append(h)
        encoder_hs  
= tf.pack(encoder_hs)
s = self.decoder.zero_state(self.batch_size, tf.float32)
        logits = []
        probs  = []
        with tf.variable_scope("decoder"):
            for t in xrange(self.max_size):
                if t > 0: tf.get_variable_scope().reuse_variables()
                if not self.is_test or t == 0:
                    x = tf.squeeze(target_xs[t], [1])
                x = tf.matmul(x, self.t_proj_W) + self.t_proj_b
                h_t, s = self.decoder(x, s)
                h_tld = self.attention(h_t, encoder_hs)

                oemb  = tf.matmul(h_tld, self.proj_W) + self.proj_b
                logit = tf.matmul(oemb, self.proj_Wo) + self.proj_bo
                prob  = tf.nn.softmax(logit)
                logits.append(logit)
                probs.append(prob)



def attention(self, h_t, encoder_hs):
        #scores = [tf.matmul(tf.tanh(tf.matmul(tf.concat(1, [h_t, tf.squeeze(h_s, [0])]),
        #                    self.W_a) + self.b_a), self.v_a)
        #          for h_s in tf.split(0, self.max_size, encoder_hs)]
        #scores = tf.squeeze(tf.pack(scores), [2])
        scores = tf.reduce_sum(tf.mul(encoder_hs, h_t), 2)
        a_t    = tf.nn.softmax(tf.transpose(scores))
        a_t    = tf.expand_dims(a_t, 2)
        c_t    = tf.batch_matmul(tf.transpose(encoder_hs, perm=[1,2,0]), a_t)
        c_t    = tf.squeeze(c_t, [2])
        h_tld  = tf.tanh(tf.matmul(tf.concat(1, [h_t, c_t]), self.W_c) + self.b_c)

        return h_tld

參考文章：

https://www.slideshare.net/KeonKim/attention-mechanisms-with-tensorflow

https://github.com/dillonalaird/Attention/blob/master/attention.py

http://www.tuicool.com/articles/nUFRban

http://www.cnblogs.com/rocketfan/p/6261467.html

http://blog.csdn.net/jerr__y/article/details/53749693

從rnn到lstm，再到seq2seq（二）

app 感受 ima bsp expand images cat https github 從圖上可以看出來，decode的過程其實都是從encode的最後一個隱層開始的，如果encode輸入過長的話，會丟失很多信息，所以設計了attation機制。 attati

從rnn到lstm，再到seq2seq（二）

從rnn到lstm，再到seq2seq（二）

從flask視角理解angular（二）Blueprint VS Component

從零開始學HTTP （二） HTTP結構與基礎

學習linux成果及命令，總結一下（二）

從PRISM開始學WPF（二）Prism？

Webpack 4.X 從入門到精通 - plugin（二）

Python從零開始寫爬蟲（二）BeautifulSoup庫使用

三年前端，面試思考（二）

Android 如何從應用深入到Framework （二）

從零開始學演算法（二）選擇排序

MySQL 之 MHA + ProxySQL + keepalived 實現讀寫分離，高可用（二）

python新人小白學爬蟲，學習筆記（二）——前期的環境準備

webpack4+react+antd從零搭建React腳手架（二）

從零開始學習Matplotlib（二）

Python從入門到入墳（二）jupyter的常用操作

SpringCloud從入門到進階（二）——註冊中心Eureka

SpringMVC乾貨系列：從零搭建SpringMVC+mybatis（二）：springMVC原理解析及常用註解

postgresql從入門到菜鳥（二）服務端配置和psql連線

機器視覺，光源筆記（二）

【原創】從原始碼剖析IO流（二）檔案流--轉載請註明出處

從rnn到lstm，再到seq2seq（二）

相關推薦