1. 程式人生 > >TensorFlow自然語言處理篇--------遞迴(迴圈)神經網路RNN(LSTM模型)

TensorFlow自然語言處理篇--------遞迴(迴圈)神經網路RNN(LSTM模型)

歡迎點選參觀我的 ——> 個人學習網站

(未完待續)

準備工作

我們將會訓練一個RNN用於語言方面,目標是給出一系列單詞,然後預測下一個單詞。為此,我們使用專門衡量這些模型好壞的標準資料:PTB資料。它的資料量比較小並且訓練起來相對較快。
PTB資料集已經預處理並含有整體10000個不同的詞,包括結束句子的標記和用於罕見詞語的特殊符號(\ < UNK>)
為了更容易處理資料,在 reader.py 中,我們將每個單詞轉換成唯一整數識別符號。

程式碼 功能
ptb_word_lm.py 使用PTB資料集訓練模型程式碼
reader.py 讀取資料程式碼

點選這裡下載資料。

構建模型

1. LSTM

模型的核心由一個LSTM單元組成,該單元每次處理一個單詞並計算句子中下一個單詞的可能值的概率。LSTM單元狀態使用零向量初始化並在讀入單詞時進行更新。出於計算原因,我們將以 batch_size 的小批量處理資料,每一批的每個詞都對應著一個時間 t ,TensorFlow將會自動計算每一批的梯度和。

例如:

t=0  t=1    t=2  t=3     t=4
[The, brown, fox, is,     quick]
[The, red,   fox, jumped, high]

words_in_dataset[0] = [The, The]
words_in_dataset[1] = [brown, red] words_in_dataset[2] = [fox, fox] words_in_dataset[3] = [is, jumped] words_in_dataset[4] = [quick, high] batch_size = 2, time_steps = 5

虛擬碼如下:

words_in_dataset = tf.placeholder(tf.float32, [time_steps, batch_size, num_features ])
lstm = tf.contrib.rnn.BasicLSTMCell
(lstm_size) # Initial state of the LSTM memory. hidden_state = tf.zeros([batch_size, lstm.state_size]) current_state = tf.zeros([batch_size, lstm.state_size]) state = hidden_state, current_state probabilities = [] loss = 0.0 for current_batch_of_words in words_in_dataset: # The value of state is updated after processing each batch of words. output, state = lstm(current_batch_of_words, state) # The LSTM output can be used to make next word predictions logits = tf.matmul(output, softmax_w) + softmax_b probabilities.append(tf.nn.softmax(logits)) loss += loss_function(probabilities, target_words)

2. 截斷反向傳播

通過設計,RNN的輸出依賴任意距離的輸入。然而,這使得BP演算法難以計算。為了使學習過程易於處理,通常會建立一個“展開”版本的網路,其中包含固定數量(num_steps)的LSTM輸入和輸出。這可以通過一次輸入長度為 num_steps 的輸入並在每個這樣的輸入之後進行反向傳遞來實現。

下面是建立執行截斷後向傳播的圖的簡化程式碼:

# Placeholder for the inputs in a given iteration.
words = tf.placeholder(tf.int32, [batch_size, num_steps])

lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = state = tf.zeros([batch_size, lstm.state_size])

for i in range(num_steps):
    # The value of state is updated after processing each batch of words.
    output, state = lstm(words[:, i], state)

    # The rest of the code.
    # ...

final_state = state

對全部資料實現迭代的程式碼:

# A numpy array holding the state of LSTM after each batch of words.
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
    numpy_state, current_loss = session.run([final_state, loss],
        # Initialize the LSTM state from the previous iteration.
        feed_dict={initial_state: numpy_state, words: current_batch_of_words})
    total_loss += current_loss

3. 輸入