#論文閱讀#attention is all you need

阿新 • • 發佈：2018-11-06

ali 計算 str red read required ado 論文 uci

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. 2017: 5998-6008.

文章提出純粹基於attention的NN model: Transformer。

the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequencealigned RNNs or convolution.

旨在解決的問題是： 現有的encoder-decoder framework為了factor句子的positional information，在encoder和decoder部分都使用RNN。這樣做 precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples. 盡管已有一些工作在提高LSTM或Gated RNN的效率方面取得進展，但 The fundamental constraint of sequential computation, however, remains. 此外，不使用CNN來解決RNN中sequential computation的原因是

： the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S . This makes it more difficult to learn dependencies between distant positions。所以，提出Transformer的總體目標是： reducing sequential computation。 Transformer總體模型結構:

Transformer中使用了三種方式的attention： ① encoder-decoder attention ② self-attention in encoder ③ self-attention in decoder Transformer中的兩個小細節：

技術分享圖片

（1）規格化點乘內積註意力（原文3.2.1 Scaled Dot-Product Attention）技術分享圖片

（2）多頭註意力（原文3.2.2 Multi-Head Attention）

把Query, Key, Value分別線性變換到不同維度空間（變換h次，即multi-head, 每次的結果用Attention(Q,K,V) 得到一個attention ，即一個head），然後把得到的h個head級聯起來就是 multi-head。

技術分享圖片

（3）position encoding（原文 3.5 Positional Encoding）

由於Transformer中沒有使用RNN或CNN，為了學習到sequence中各個token的relative positions和absolute positions相關信息，添加了position encoding，使其和token embedding的維數相同，以便把（token embedding + position embedding）做為encoder和decoder的各個stack layer的輸入。 position embedding的dimension也是d_model 技術分享圖片

這種學習位置信息的方式很巧妙，測試了一下序列長度設置為 8，d_model 設為32時，可視化效果：

技術分享圖片關於正余弦值來近似位置的相對關系，參考蘇劍林的解釋：

蘇劍林. (2018, Jan 06). 《《Attention is All You Need》淺讀（簡介+代碼）》[Blog post]. Retrieved from https://www.spaces.ac.cn/archives/4765 註：這篇文章寫得很好。

Transformer對該文總體goal（reducing sequential computation）的回應：

從3個角度對比解釋why transformer可以reduce computation（原文 4 Why Self-Attention）（主要功勞是self-attention）下圖是時間復雜度的對比：技術分享圖片

實驗結果：

任務一： machine translation 技術分享圖片

技術分享圖片

任務二： english constituency parsing 技術分享圖片

對Transformer 的 implementation: https://github.com/jadore801120/attention-is-all-you-need-pytorch The official Tensorflow Implementation can be found in: tensorflow/tensor2tensor. To learn more about self-attention mechanism, you could read "A Structured Self-attentive Sentence Embedding". 總結： 論文提出的模型結構，確實在同等甚至優於已有的模型實驗結果的前提下，節省了計算時間和計算資源。這也是論文中一直強調的目標。還有如下3點學習心得：（1）之前對attention的使用都是在已有經典模型（如LSTM, CNN）的基礎上使用註意力機制。這篇論文拋開這些基礎模型，純粹使用attention搭建encoder-decoder模型，從實驗結果來看，驗證了attention機制本身的威力，就像論文題目說的“attention is all you need”。（2）另外，在encoder和decoder的內部分別使用attention，即self-attention說明自註意力機制能夠很好地學習sentence內部各個token的位置信息。（3）multi-head和在encoder和decoder部分的stacks很巧妙地在層與層之間和單層上增加了學習參數，如文中說的“分別在不同的維度空間學習到不同的關鍵信息”，最後把這些關鍵信息級聯構成一個特征向量，與cnn池化層後的flat向量有異曲同工之妙。

#論文閱讀#attention is all you need

ali 計算 str red read required ado 論文 uci Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Infor

#論文閱讀#attention is all you need

#論文閱讀#attention is all you need

論文閱讀-attention-is-all-you-need

pytorch求索(4): 跟著論文《 Attention is All You Need》一步一步實現Attention和Transformer

Attention is all you need 論文詳解（轉）

[閱讀筆記]Attention Is All You Need - Transformer結構

Attention is all you need閱讀筆記

釋出一年了，做NLP的還有沒看過這篇論文的嗎？--“Attention is all you need”

Paper Reading - Attention Is All You Need ( NIPS 2017 )

Attention is all you need及其在TTS中的應用Close to Human Quality TTS with Transformer和BERT

Attention Is All You Need（Transformer）原理小結

《Attention Is All You Need》

Attention is All You Need -- 淺析

Transformer【Attention is all you need】

bert之transformer（attention is all you need）

[NIPS2017]Attention is all you need

一文讀懂「Attention is All You Need」| 附程式碼實現

谷歌機器翻譯Attention is All You Need

Day3_attention is all you need 論文閱讀

All you need is attention（Tranformer） --學習筆記

Attention all you need

#論文閱讀#attention is all you need

相關推薦