DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(1)
阿新 • • 發佈:2018-11-09
title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(1)’
id: dl-ai-5-1h1
tags:
- dl.ai
- homework
categories: - AI
- Deep Learning
date: 2018-10-18 10:26:56
本週作業分為三部分:
- 手動建一個RNN模型
- 搭建一個字元級的語言模型來生成恐龍的名字
- 用LSTM生成爵士樂
Part1:Building a recurrent neural network - step by step
來構建一個RNN的神經網路。
1 - Forward propagation for the basic Recurrent Neural Network
先來進行前向傳播的構建,要構建這個網路,先構建每個RNN的傳播單元:
RNN cell
- Compute the hidden state with tanh activation:
- Using your new hidden state
, compute the prediction
. We provided you a function:
softmax
. - Store in cache
- Return , and cache
We will vectorize over examples. Thus, will have dimension , and will have dimension .
# GRADED FUNCTION: rnn_cell_forward
def rnn_cell_forward(xt, a_prev, parameters):
"""
Implements a single forward step of the RNN-cell as described in Figure (2)
Arguments:
xt -- your input data at timestep "t", numpy array of shape (n_x, m).
a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
parameters -- python dictionary containing:
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
ba -- Bias, numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
Returns:
a_next -- next hidden state, of shape (n_a, m)
yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
"""
# Retrieve parameters from "parameters"
Wax = parameters["Wax"]
Waa = parameters["Waa"]
Wya = parameters["Wya"]
ba = parameters["ba"]
by = parameters["by"]
### START CODE HERE ### (≈2 lines)
# compute next activation state using the formula given above
a_next = np.tanh(np.dot(Waa, a_prev) + np.dot(Wax, xt) + ba)
# compute output of the current cell using the formula given above
yt_pred = softmax(np.dot(Wya, a_next) + by)
### END CODE HERE ###
# store values you need for backward propagation in cache
cache = (a_next, a_prev, xt, parameters)
return a_next, yt_pred, cache
RNN forward pass
思路是:
- 先把 a ,y_pred置為0
- 然後初始化a_next = a0
- 然後經過Tx個迴圈,求得每一步的a和y以及cache
# GRADED FUNCTION: rnn_forward
def rnn_forward(x, a0, parameters):
"""
Implement the forward propagation of the recurrent neural network described in Figure (3).
Arguments:
x -- Input data for every time-step, of shape (n_x, m, T_x).
a0 -- Initial hidden state, of shape (n_a, m)
parameters -- python dictionary containing:
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
ba -- Bias numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
Returns:
a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
caches -- tuple of values needed for the backward pass, contains (list of caches, x)
"""
# Initialize "caches" which will contain the list of all caches
caches = []
# Retrieve dimensions from shapes of x and parameters["Wya"]
n_x, m, T_x = x.shape
n_y, n_a = parameters["Wya"].shape
### START CODE HERE ###
# initialize "a" and "y" with zeros (≈2 lines)
a = np.zeros((n_a, m, T_x))
y_pred = np.zeros((n_y, m, T_x))
# Initialize a_next (≈1 line)
a_next = a0
# loop over all time-steps
for t in range(T_x):
# Update next hidden state, compute the prediction, get the cache (≈1 line)
a_next, yt_pred, cache = rnn_cell_forward(x[:, :, t], a_next, parameters)
# Save the value of the new "next" hidden state in a (≈1 line)
a[:,:,t] = a_next
# Save the value of the prediction in y (≈1 line)
y_pred[:,:,t] = yt_pred
# Append "cache" to "caches" (≈1 line)
caches.append(cache)
### END CODE HERE ###
# store values needed for backward propagation in cache
caches = (caches, x)
return a, y_pred, caches
2 - Long Short-Term Memory (LSTM) network
接下來構建一個LSTM的網路
遺忘門:
假設我們正在閱讀一段文字中的單詞,並且希望使用LSTM來跟蹤語法結構,例如主語是單數還是複數。 如果主語從單個單詞變成複數單詞,我們需要找到一種方法來擺脫先前儲存的單數/複數狀態的記憶值。
在LSTM中,遺忘門讓我們做到這一點:
更新門:
一旦我們忘記所討論的主題是單數的,我們需要找到一種方法來更新它,以反映新主題現在是複數。
所以兩個門結合起來可以更新單元值:
KaTeX parse error: Expected '}', got '\>' at position 7: c^{<t\̲>̲} = \Gamma_f^{<…
輸出門:
為了決定輸出,我們將使用以下兩個公式: