pytorch筆記：07)LSTM

阿新 • • 發佈：2018-11-10

LSTM的介紹博文：https://colah.github.io/posts/2015-08-Understanding-LSTMs/
官方AIP：https://pytorch.org/docs/stable/nn.html?#torch.nn.LSTM

一個栗子，假如我們輸入有3個句子，每個句子都由5個單片語成，而每個單詞用10維的詞向量表示，則seq_len=5, batch=3, input_size=10

類初始化核心引數：

input_size – The number of expected features in the input x
#單詞的詞向量的維數，如input_size=10 

hidden_size – The number of features in the hidden state h
#隱藏層的維度
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking 
#有多少個LSTM層，如num_layers=2即2個LSTM串聯一起，上個LSTM的輸出即下一個的輸入

有個困惑點：num_layers=2即2個LSTM串聯一起，上個LSTM的輸出即下一個的輸入，若約定輸入維度為10，隱藏層維度為20，那第一個LSTM的隱藏層維度是10還是20 ？

 for layer in range(num_layers):
     for direction in range(num_directions):
         layer_input_size = input_size if layer == 0 else hidden_size * num_directions
         #num_directions=1
         w_ih = Parameter(torch.Tensor(gate_size, layer_input_size))
         w_hh = Parameter(torch.Tensor(gate_size, hidden_size))

從上面的程式碼可以看到，無論是第幾個LSTM，其隱藏層維度不變。只不過下個LSTM的輸入維度變成了上一個LSTM的隱藏層維度。LSTM1(10,20) ->LSTM2(20,20) (input_size,hidden_size )

Inputs: input, (h_0, c_0)

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. 
#可以參看上面的例子，注意這裡的引數排列和keras不同，batch在第二個位置上
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.
#初始化的隱藏元
c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch.
#初始化的記憶元
If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.

Outputs: output, (h_n, c_n)

output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. 
#可以類比input,具體看看下面的例子
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
#隱藏元輸出
c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len
#記憶元輸出

官方API栗子

#詞向量維數10,隱藏元維度20,2個LSTM層串聯,
rnn = nn.LSTM(10, 20, 2)
#序列長度seq_len=5,batch_size=3,詞向量維數=10
input = torch.randn(5, 3, 10)
#初始化的隱藏元和記憶元,通常它們是維度是一樣的
#2個LSTM層，batch_size=3,隱藏元維度20
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
#這裡有2層lstm，output是最後一層lstm的每個詞向量對應隱藏層的輸出,其與層數無關，只與序列長度相關
#hn,cn是所有層最後一個隱藏元和記憶元的輸出
output, (hn,cn) = rnn(input, (h0, c0))

print(output.size(),hn.size(),cn.size())
torch.Size([5, 3, 20]) torch.Size([2, 3, 20]) torch.Size([2, 3, 20])

官方tutorials栗子
(比對2種寫法得結果，程式碼中把對h0和c0的初始換成了zeros)

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.zeros(1, 3) for _ in range(5)]  # 長度為5的序列，batch=1,詞向量維度3
hidden = (torch.zeros(1, 1, 3),torch.zeros(1, 1, 3)) # 1個STML層，1個batch,隱藏元維度3
#讓每個詞向量依次通過STML
for i in inputs:
    out, hidden = lstm(i.view(1, 1, -1), hidden)
    print(out) #列印每次的out

#tensor(1.00000e-02 *[[[-9.1601, -2.4799, -4.8088]]])
#tensor([[[-0.1536, -0.0358, -0.0787]]])
#tensor([[[-0.1923, -0.0398, -0.0976]]])

另一種寫法，其把序列視為一個整體處理

#inputs_size: (5,1,3) seq_len=5,batch=1,feature=3
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.zeros(1, 1, 3), torch.zeros(1, 1, 3)) 
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

tensor([[[-0.0916, -0.0248, -0.0481]],
        [[-0.1536, -0.0358, -0.0787]],
        [[-0.1923, -0.0398, -0.0976]],
        [[-0.2158, -0.0405, -0.1092]],
        [[-0.2301, -0.0398, -0.1162]]])
(tensor([[[-0.2301, -0.0398, -0.1162]]]), tensor([[[-0.6446, -0.0941, -0.2772]]]))

兩種方法output是一樣的，並且hn是最後一個詞向量隱藏元輸出

pytorch筆記：07)LSTM

LSTM的介紹博文：https://colah.github.io/posts/2015-08-Understanding-LSTMs/ 官方AIP：https://pytorch.org/docs/stable/nn.html?#torch.nn.LSTM 一個栗子，假如我們輸入

pytorch筆記：08)使用LSTM寫古詩詞

測試環境： centos7 + python3.6 + pytorch0.4 +cuda9 下面是用模型生成的藏頭詩(深度學習) 深宮昔時見，古貌多自有。度日不相容，年年生一目。學者若為霖，百姓貽憂厄。習坎與天聰，優遊寧敢屢。訓練資料 57580首詩歌，每首詩歌，

pytorch筆記：05)UNet網路簡單實現

語義分割的相關介紹可參考該部落格： https://blog.csdn.net/u012931582/article/details/70314859 程式碼參考圖1設計，黑色加粗的標註是筆者新增的，和程式碼中的變數對應圖1 Unet

pytorch筆記：04)resnet網路&解決輸入影象大小問題

因為torchvision對resnet18-resnet152進行了封裝實現，因而想跟蹤下原始碼(^▽^) 首先看張核心的resnet層次結構圖(圖1)，它詮釋了resnet18-152是如何搭建的，其中resnet18和resnet34結構類似，而resnet50-resnet1

pytorch筆記：03)softmax和log_softmax，以及CrossEntropyLoss

softmax在神經網路裡面比較常見，簡而言之，就是多分類的概率輸出 sotfmax(xi)=exp(xi)∑jexp(xj) s o t

pytorch筆記：06)requires_grad和volatile

requires_grad Variable變數的requires_grad的屬性預設為False,若一個節點requires_grad被設定為True，那麼所有依賴它的節點的requires_grad都為True。 x=Variable(torch.ones(1)) w=Vari

pytorch筆記：09)Attention機制

剛從影象處理的hole中攀爬出來，剛走一步竟掉到了另一個hole（fire in the hole*▽*） 1.RNN中的attention pytorch官方教程：https://pytorch.org/tutorials/intermediate/seq2seq_translatio

pytorch入門與實踐學習筆記：chapter5 工具篇

目錄 1.資料載入 2.資料預處理 torchvision.transforms.Compose(transforms) torchvision.transforms.Scale(size, interpolation=2) torchvision.transf

pytorch入門與實踐學習筆記：chapter6 貓狗大戰

1. 程式的組成架構前面提到過，程式主要包含以下功能：模型定義資料載入訓練模型訓練過程視覺化測試 2.關於__init__.py 可以看到，幾乎每個資料夾下都有`__init__.py`，一個目錄如果包含了`__init

學習筆記：《深度學習框架PyTorch入門與實踐》（陳雲）Part1

學習筆記：《深度學習框架PyTorch入門與實踐》（陳雲）Part1 2017年1月，FAIR團隊在GitHub上開源了PyTorch。常見的深度學習框架：

筆記：I/O流-字符集

表示 ava deb 建立 gin integer 示例字節標準化 Java 庫的 java.nio 包用 Charset 類統一了對字符集的轉換，支付姐建立了兩個字節Unicode碼元序列與使用本地字符編碼方式的字節序列之間的映

筆記：I/O流-對象序列化

err extends 自己 point clas xtend his size cto Java 語言支持一種稱為對象序列化（Object Serialization）的非常通用的機制，可以將任何對象寫入到流中，並在之後將其讀回，首先需要支持對象

筆記：mysql 下載與安裝

blog utf8 字符 data strong def services -1 個人 1.下載：官網下載是需要註冊的，或者你已經有了Oracle的賬號. 可以直接百度 mysql 進入官網；或者直接下面的鏈接；附上鏈接：dev.mysql.com/downloa

張高興的 Windows 10 IoT 開發筆記：使用 ULN2003A 控制步進電機

uln2003 zhang windows iot ges 開發 ima dem win 　　GitHub：https://github.com/ZhangGaoxing/windows-iot-demo/tree/master/ULN2003A 　　張高興的 Wind

筆記：I/O流-內存映射文件

pos 開始 col java mod 傳播寫入 i/o .get 內存映射文件時利用虛擬內存實現來將一個文件或者文件的一部分映射到內存中，然後整個文件就可以當作數組一樣的訪問，這個比傳統的文件操作要快得多，Java 使用內存映射文件首先需要從文件中獲取一個cha

Linux學習筆記：存儲管理

linux 磁盤管理 Linux系統中所有的硬件設備都是通過文件的方式來表現和使用的，我們將這些文件稱為設備文件，在Linux下的/dev目錄中有大量的設備文件，根據設備文件的不同，又分為字符設備文件和塊設備文件。字符設備文件的存取是以字符流的方式來進行的，一次傳送一個字符。常見的有打印

菜鳥運維筆記：安裝與配置Apacheserver

str .cn apach tps 官網壓縮 ron entos 本地前幾天在在阿裏花了49.5買了一個月的主機。試著好用再續費吧。地域：青島可用區：青島可用區ACPU：1核內存：512MB帶寬：1Mbps操作系統：CentOS 6.5 64位雲盾：是

學習筆記：javascript內置對象：數組對象

b- sort splice 刪除分隔 href 結果 join() strong 1.數組對象的創建 1.設置一個長度為0的數組 var myarr=new array(); 2.設置一個長度為n的數組 var myarr=new arr(n); 3.聲明一個

學習筆記：javascript內置對象：日期對象

etsec sel mil cond ava com 描述學習筆記 asp 2.日期對象的常用函數 2.日期對象的常用函數 Date 對象方法方法描述 Date() 返回當日的日期和時間。 getDate() 從 Date 對象返回一個月

[Other]面試復習筆記：線程與進程復習

處理機 sse 進程上下文提高關系數據查詢優點 con 計數器基本概念1. 進程的基本概念線程(thread)是進程(processes)中某個單一順序的控制流，也被稱為輕量進程(lightweight processes)。進程是表示資源分配的基本單位，又是調

pytorch筆記：07)LSTM

相關推薦