1. 程式人生 > >pytorch筆記:07)LSTM

pytorch筆記:07)LSTM

LSTM的介紹博文:https://colah.github.io/posts/2015-08-Understanding-LSTMs/
官方AIP:https://pytorch.org/docs/stable/nn.html?#torch.nn.LSTM

一個栗子,假如我們輸入有3個句子,每個句子都由5個單片語成,而每個單詞用10維的詞向量表示,則seq_len=5, batch=3, input_size=10

類初始化核心引數

input_size – The number of expected features in the input x
#單詞的詞向量的維數,如input_size=10
hidden_size – The number of features in the hidden state h #隱藏層的維度 num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking #有多少個LSTM層,如num_layers=2即2個LSTM串聯一起,上個LSTM的輸出即下一個的輸入

有個困惑點:num_layers=2即2個LSTM串聯一起,上個LSTM的輸出即下一個的輸入,若約定輸入維度為10,隱藏層維度為20,那第一個LSTM的隱藏層維度是10還是20 ?

 for layer in range(num_layers):
     for direction in range(num_directions):
         layer_input_size = input_size if layer == 0 else hidden_size * num_directions
         #num_directions=1
         w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
         w_hh = Parameter(torch.Tensor(gate_size, hidden_size))

從上面的程式碼可以看到,無論是第幾個LSTM,其隱藏層維度不變。只不過下個LSTM的輸入維度變成了上一個LSTM的隱藏層維度。LSTM1(10,20) ->LSTM2(20,20) (input_size,hidden_size )

Inputs: input, (h_0, c_0)

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. 
#可以參看上面的例子,注意這裡的引數排列和keras不同,batch在第二個位置上
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.
#初始化的隱藏元
c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch.
#初始化的記憶元
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.

Outputs: output, (h_n, c_n)

output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. 
#可以類比input,具體看看下面的例子
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
#隱藏元輸出
c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len
#記憶元輸出

官方API栗子

#詞向量維數10,隱藏元維度20,2個LSTM層串聯,
rnn = nn.LSTM(10, 20, 2)
#序列長度seq_len=5,batch_size=3,詞向量維數=10
input = torch.randn(5, 3, 10)
#初始化的隱藏元和記憶元,通常它們是維度是一樣的
#2個LSTM層,batch_size=3,隱藏元維度20
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
#這裡有2層lstm,output是最後一層lstm的每個詞向量對應隱藏層的輸出,其與層數無關,只與序列長度相關
#hn,cn是所有層最後一個隱藏元和記憶元的輸出
output, (hn,cn) = rnn(input, (h0, c0))

print(output.size(),hn.size(),cn.size())
torch.Size([5, 3, 20]) torch.Size([2, 3, 20]) torch.Size([2, 3, 20])

官方tutorials栗子
(比對2種寫法得結果,程式碼中把對h0和c0的初始換成了zeros)

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.zeros(1, 3) for _ in range(5)]  # 長度為5的序列,batch=1,詞向量維度3
hidden = (torch.zeros(1, 1, 3),torch.zeros(1, 1, 3)) # 1個STML層,1個batch,隱藏元維度3
#讓每個詞向量依次通過STML
for i in inputs:
    out, hidden = lstm(i.view(1, 1, -1), hidden)
    print(out) #列印每次的out

#tensor(1.00000e-02 *[[[-9.1601, -2.4799, -4.8088]]])
#tensor([[[-0.1536, -0.0358, -0.0787]]])
#tensor([[[-0.1923, -0.0398, -0.0976]]])

另一種寫法,其把序列視為一個整體處理

#inputs_size: (5,1,3) seq_len=5,batch=1,feature=3
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.zeros(1, 1, 3), torch.zeros(1, 1, 3)) 
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

tensor([[[-0.0916, -0.0248, -0.0481]],
        [[-0.1536, -0.0358, -0.0787]],
        [[-0.1923, -0.0398, -0.0976]],
        [[-0.2158, -0.0405, -0.1092]],
        [[-0.2301, -0.0398, -0.1162]]])
(tensor([[[-0.2301, -0.0398, -0.1162]]]), tensor([[[-0.6446, -0.0941, -0.2772]]]))

兩種方法output是一樣的,並且hn是最後一個詞向量隱藏元輸出