TensorFlow自然語言處理篇--------遞迴(迴圈)神經網路RNN(LSTM模型)
阿新 • • 發佈:2019-01-05
(未完待續)
準備工作
我們將會訓練一個RNN用於語言方面,目標是給出一系列單詞,然後預測下一個單詞。為此,我們使用專門衡量這些模型好壞的標準資料:PTB資料。它的資料量比較小並且訓練起來相對較快。
PTB資料集已經預處理並含有整體10000個不同的詞,包括結束句子的標記和用於罕見詞語的特殊符號(\ < UNK>)。
為了更容易處理資料,在 reader.py
中,我們將每個單詞轉換成唯一整數識別符號。
程式碼 | 功能 |
---|---|
ptb_word_lm.py | 使用PTB資料集訓練模型程式碼 |
reader.py | 讀取資料程式碼 |
點選這裡下載資料。
構建模型
1. LSTM
模型的核心由一個LSTM單元組成,該單元每次處理一個單詞並計算句子中下一個單詞的可能值的概率。LSTM單元狀態使用零向量初始化並在讀入單詞時進行更新。出於計算原因,我們將以 batch_size
的小批量處理資料,每一批的每個詞都對應著一個時間 t ,TensorFlow將會自動計算每一批的梯度和。
例如:
t=0 t=1 t=2 t=3 t=4
[The, brown, fox, is, quick]
[The, red, fox, jumped, high]
words_in_dataset[0] = [The, The]
words_in_dataset[1] = [brown, red]
words_in_dataset[2] = [fox, fox]
words_in_dataset[3] = [is, jumped]
words_in_dataset[4] = [quick, high]
batch_size = 2, time_steps = 5
虛擬碼如下:
words_in_dataset = tf.placeholder(tf.float32, [time_steps, batch_size, num_features ])
lstm = tf.contrib.rnn.BasicLSTMCell (lstm_size)
# Initial state of the LSTM memory.
hidden_state = tf.zeros([batch_size, lstm.state_size])
current_state = tf.zeros([batch_size, lstm.state_size])
state = hidden_state, current_state
probabilities = []
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities.append(tf.nn.softmax(logits))
loss += loss_function(probabilities, target_words)
2. 截斷反向傳播
通過設計,RNN的輸出依賴任意距離的輸入。然而,這使得BP演算法難以計算。為了使學習過程易於處理,通常會建立一個“展開”版本的網路,其中包含固定數量(num_steps)的LSTM輸入和輸出。這可以通過一次輸入長度為 num_steps 的輸入並在每個這樣的輸入之後進行反向傳遞來實現。
下面是建立執行截斷後向傳播的圖的簡化程式碼:
# Placeholder for the inputs in a given iteration.
words = tf.placeholder(tf.int32, [batch_size, num_steps])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
# The value of state is updated after processing each batch of words.
output, state = lstm(words[:, i], state)
# The rest of the code.
# ...
final_state = state
對全部資料實現迭代的程式碼:
# A numpy array holding the state of LSTM after each batch of words.
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
numpy_state, current_loss = session.run([final_state, loss],
# Initialize the LSTM state from the previous iteration.
feed_dict={initial_state: numpy_state, words: current_batch_of_words})
total_loss += current_loss