DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(1)

  • 手動建一個RNN模型
  • 搭建一個字元級的語言模型來生成恐龍的名字
  • 用LSTM生成爵士樂

Part1:Building a recurrent neural network - step by step


1 - Forward propagation for the basic Recurrent Neural Network


RNN cell

  1. Compute the hidden state with tanh activation: a
    t = tanh ( W a
    a t 1 + W a x x t + b a ) a^{\langle t \rangle} = \tanh(W_{aa} a^{\langle t-1 \rangle} + W_{ax} x^{\langle t \rangle} + b_a)
  2. Using your new hidden state a t a^{\langle t \rangle} , compute the prediction y ^ t = s o f t m a x ( W y a a t + b y ) \hat{y}^{\langle t \rangle} = softmax(W_{ya} a^{\langle t \rangle} + b_y) . We provided you a function: softmax.
  3. Store ( a t , a t 1 , x t , p a r a m e t e r s ) (a^{\langle t \rangle}, a^{\langle t-1 \rangle}, x^{\langle t \rangle}, parameters) in cache
  4. Return a t a^{\langle t \rangle} , y t y^{\langle t \rangle} and cache

We will vectorize over m m examples. Thus, x t x^{\langle t \rangle} will have dimension ( n x , m ) (n_x,m) , and a t a^{\langle t \rangle} will have dimension ( n a , m ) (n_a,m) .

# GRADED FUNCTION: rnn_cell_forward

def rnn_cell_forward(xt, a_prev, parameters):
    Implements a single forward step of the RNN-cell as described in Figure (2)

    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    a_next -- next hidden state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]
    ### START CODE HERE ### (≈2 lines)
    # compute next activation state using the formula given above
    a_next = np.tanh(np.dot(Waa, a_prev) + np.dot(Wax, xt) + ba)
    # compute output of the current cell using the formula given above
    yt_pred = softmax(np.dot(Wya, a_next) + by)    
    ### END CODE HERE ###
    # store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)
    return a_next, yt_pred, cache

RNN forward pass


  • 先把 a ,y_pred置為0
  • 然後初始化a_next = a0
  • 然後經過Tx個迴圈,求得每一步的a和y以及cache
# GRADED FUNCTION: rnn_forward

def rnn_forward(x, a0, parameters):
    Implement the forward propagation of the recurrent neural network described in Figure (3).

    x -- Input data for every time-step, of shape (n_x, m, T_x).
    a0 -- Initial hidden state, of shape (n_a, m)
    parameters -- python dictionary containing:
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    caches -- tuple of values needed for the backward pass, contains (list of caches, x)
    # Initialize "caches" which will contain the list of all caches
    caches = []
    # Retrieve dimensions from shapes of x and parameters["Wya"]
    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wya"].shape
    ### START CODE HERE ###
    # initialize "a" and "y" with zeros (≈2 lines)
    a = np.zeros((n_a, m, T_x))
    y_pred = np.zeros((n_y, m, T_x))
    # Initialize a_next (≈1 line)
    a_next = a0
    # loop over all time-steps
    for t in range(T_x):
        # Update next hidden state, compute the prediction, get the cache (≈1 line)
        a_next, yt_pred, cache = rnn_cell_forward(x[:, :, t], a_next, parameters)
        # Save the value of the new "next" hidden state in a (≈1 line)
        a[:,:,t] = a_next
        # Save the value of the prediction in y (≈1 line)
        y_pred[:,:,t] = yt_pred
        # Append "cache" to "caches" (≈1 line)
    ### END CODE HERE ###
    # store values needed for backward propagation in cache
    caches = (caches, x)
    return a, y_pred, caches

2 - Long Short-Term Memory (LSTM) network



假設我們正在閱讀一段文字中的單詞,並且希望使用LSTM來跟蹤語法結構,例如主語是單數還是複數。 如果主語從單個單詞變成複數單詞,我們需要找到一種方法來擺脫先前儲存的單數/複數狀態的記憶值。


Γ f t = σ ( W f [ a t 1 , x t ] + b f ) \Gamma_f^{\langle t \rangle} = \sigma(W_f[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_f)



Γ u t = σ ( W u [ a t 1 , x { t } ] + b u ) \Gamma_u^{\langle t \rangle} = \sigma(W_u[a^{\langle t-1 \rangle}, x^{\{t\}}] + b_u)


c ~ t = tanh ( W c [ a t 1 , x t ] + b c ) \tilde{c}^{\langle t \rangle} = \tanh(W_c[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_c)

KaTeX parse error: Expected '}', got '\>' at position 7: c^{<t\̲>̲} = \Gamma_f^{<…



Γ o t = σ ( W o [ a t 1 , x t ] + b o ) \Gamma_o^{\langle t \rangle}= \sigma(W_o[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_o)
a t = Γ o t tanh ( c t ) a^{\langle t \rangle} = \Gamma_o^{\langle t \rangle}* \tanh(c^{\langle t \rangle})


