1. 程式人生 > >DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(1)

DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(1)


title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(1)’
id: dl-ai-5-1h1
tags:

  • dl.ai
  • homework
    categories:
  • AI
  • Deep Learning
    date: 2018-10-18 10:26:56

本週作業分為三部分:

  • 手動建一個RNN模型
  • 搭建一個字元級的語言模型來生成恐龍的名字
  • 用LSTM生成爵士樂

Part1:Building a recurrent neural network - step by step

來構建一個RNN的神經網路。

1 - Forward propagation for the basic Recurrent Neural Network

先來進行前向傳播的構建,要構建這個網路,先構建每個RNN的傳播單元:

RNN cell

  1. Compute the hidden state with tanh activation: a
    t = tanh ( W a
    a
    a t 1 + W a x x t + b a ) a^{\langle t \rangle} = \tanh(W_{aa} a^{\langle t-1 \rangle} + W_{ax} x^{\langle t \rangle} + b_a)
  2. Using your new hidden state a t a^{\langle t \rangle} , compute the prediction y ^ t = s o f t m a x ( W y a a t + b y ) \hat{y}^{\langle t \rangle} = softmax(W_{ya} a^{\langle t \rangle} + b_y) . We provided you a function: softmax.
  3. Store ( a t , a t 1 , x t , p a r a m e t e r s ) (a^{\langle t \rangle}, a^{\langle t-1 \rangle}, x^{\langle t \rangle}, parameters) in cache
  4. Return a t a^{\langle t \rangle} , y t y^{\langle t \rangle} and cache

We will vectorize over m m examples. Thus, x t x^{\langle t \rangle} will have dimension ( n x , m ) (n_x,m) , and a t a^{\langle t \rangle} will have dimension ( n a , m ) (n_a,m) .

# GRADED FUNCTION: rnn_cell_forward

def rnn_cell_forward(xt, a_prev, parameters):
    """
    Implements a single forward step of the RNN-cell as described in Figure (2)

    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    """
    
    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]
    
    ### START CODE HERE ### (≈2 lines)
    # compute next activation state using the formula given above
    a_next = np.tanh(np.dot(Waa, a_prev) + np.dot(Wax, xt) + ba)
    # compute output of the current cell using the formula given above
    yt_pred = softmax(np.dot(Wya, a_next) + by)    
    ### END CODE HERE ###
    
    # store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)
    
    return a_next, yt_pred, cache

RNN forward pass

思路是:

  • 先把 a ,y_pred置為0
  • 然後初始化a_next = a0
  • 然後經過Tx個迴圈,求得每一步的a和y以及cache
# GRADED FUNCTION: rnn_forward

def rnn_forward(x, a0, parameters):
    """
    Implement the forward propagation of the recurrent neural network described in Figure (3).

    Arguments:
    x -- Input data for every time-step, of shape (n_x, m, T_x).
    a0 -- Initial hidden state, of shape (n_a, m)
    parameters -- python dictionary containing:
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Returns:
    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    caches -- tuple of values needed for the backward pass, contains (list of caches, x)
    """
    
    # Initialize "caches" which will contain the list of all caches
    caches = []
    
    # Retrieve dimensions from shapes of x and parameters["Wya"]
    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wya"].shape
    
    ### START CODE HERE ###
    
    # initialize "a" and "y" with zeros (≈2 lines)
    a = np.zeros((n_a, m, T_x))
    y_pred = np.zeros((n_y, m, T_x))
    
    # Initialize a_next (≈1 line)
    a_next = a0
    
    # loop over all time-steps
    for t in range(T_x):
        # Update next hidden state, compute the prediction, get the cache (≈1 line)
        a_next, yt_pred, cache = rnn_cell_forward(x[:, :, t], a_next, parameters)
        # Save the value of the new "next" hidden state in a (≈1 line)
        a[:,:,t] = a_next
        # Save the value of the prediction in y (≈1 line)
        y_pred[:,:,t] = yt_pred
        # Append "cache" to "caches" (≈1 line)
        caches.append(cache)
        
    ### END CODE HERE ###
    
    # store values needed for backward propagation in cache
    caches = (caches, x)
    
    return a, y_pred, caches

2 - Long Short-Term Memory (LSTM) network

接下來構建一個LSTM的網路

遺忘門:

假設我們正在閱讀一段文字中的單詞,並且希望使用LSTM來跟蹤語法結構,例如主語是單數還是複數。 如果主語從單個單詞變成複數單詞,我們需要找到一種方法來擺脫先前儲存的單數/複數狀態的記憶值。

在LSTM中,遺忘門讓我們做到這一點:

Γ f t = σ ( W f [ a t 1 , x t ] + b f ) \Gamma_f^{\langle t \rangle} = \sigma(W_f[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_f)

更新門:

一旦我們忘記所討論的主題是單數的,我們需要找到一種方法來更新它,以反映新主題現在是複數。

Γ u t = σ ( W u [ a t 1 , x { t } ] + b u ) \Gamma_u^{\langle t \rangle} = \sigma(W_u[a^{\langle t-1 \rangle}, x^{\{t\}}] + b_u)

所以兩個門結合起來可以更新單元值:

c ~ t = tanh ( W c [ a t 1 , x t ] + b c ) \tilde{c}^{\langle t \rangle} = \tanh(W_c[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_c)

KaTeX parse error: Expected '}', got '\>' at position 7: c^{<t\̲>̲} = \Gamma_f^{<…

輸出門:

為了決定輸出,我們將使用以下兩個公式:

Γ o t = σ ( W o [ a t 1 , x t ] + b o ) \Gamma_o^{\langle t \rangle}= \sigma(W_o[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_o)
a t = Γ o t tanh ( c t ) a^{\langle t \rangle} = \Gamma_o^{\langle t \rangle}* \tanh(c^{\langle t \rangle})

相關推薦

DeepLearning.ai作業:(5-1)-- 迴圈神經網路Recurrent Neural Networks1

title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(1)’ id: dl-ai-5-1h1 tags: dl.ai homework categories: AI Deep

DeepLearning.ai作業:(5-1)-- 迴圈神經網路Recurrent Neural Networks2

title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(2)’ id: dl-ai-5-1h2 tags: dl.ai homework categories: AI Deep

DeepLearning.ai作業:(5-1)-- 迴圈神經網路Recurrent Neural Networks3

title: ‘DeepLearning.ai作業:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)(3)’ id: dl-ai-5-1h3 tags: dl.ai homework categories: AI Deep

DeepLearning.ai筆記:(5-1)-- 迴圈神經網路Recurrent Neural Networks

title: ‘DeepLearning.ai筆記:(5-1)-- 迴圈神經網路(Recurrent Neural Networks)’ id: dl-ai-5-1 tags: dl.ai categories: AI Deep Learning date: 2

TensorFlow HOWTO 5.1 迴圈神經網路時間序列

5.1 迴圈神經網路(時間序列) 迴圈神經網路(RNN)用於建模帶有時間關係的資料。它的架構是這樣的。 在最基本的 RNN 中,單元(方框)中的操作和全連線層沒什麼區別,都是線性變換和啟用。它完全可以看做多個全連線層的橫向擴充套件。 但是運算元量多了之後,就會有梯度消失

DeepLearning.ai作業:(5-2) -- 自然語言處理與詞嵌入(NLP and Word Embeddings)

title: ‘DeepLearning.ai作業:(5-2) – 自然語言處理與詞嵌入(NLP and Word Embeddings)’ id: dl-ai-5-2h tags: dl.ai homework categories: AI Deep L

DeepLearning.ai作業:(5-3) -- 序列模型和注意力機制

title: ‘DeepLearning.ai作業:(5-3) – 序列模型和注意力機制’ id: dl-ai-5-3h tags: dl.ai homework categories: AI Deep Learning date: 2018-10-18 1

吳恩達deeplearning.ai課程《改善深層神經網路:超引數除錯、正則化以及優化》____學習筆記第一週

____tz_zs學習筆記第一週 深度學習的實用層面(Practical aspects of Deep Learning)我們將學習如何有效運作神經網路(超引數調優、如何構建資料以及如何確保優化演算法快速執行)設定ML應用(Setting up your ML applic

DeepLearning.ai筆記:(5-2) -- 自然語言處理與詞嵌入(NLP and Word Embeddings

title: ‘DeepLearning.ai筆記:(5-2) – 自然語言處理與詞嵌入(NLP and Word Embeddings)’ id: dl-ai-5-2 tags: dl.ai categories: AI Deep Learning date:

Coursera 吳恩達 Deep Learning 第二課 改善神經網路 Improving Deep Neural Networks 第二週 程式設計作業程式碼Optimization methods

Optimization Methods Until now, you’ve always used Gradient Descent to update the parameters and minimize the cost. In this notebo

吳恩達Deeplearning.ai 第五課 Sequence Model 第一週------Recurrent Neural Network Model

這一節內容比較多,主要講述瞭如何搭建一個RNN標準單元 使用標準神經網路的不足: 1.不同樣本的輸入輸出長度不等(雖然都可以padding成最大長度的樣本) 2.(更主要的原因)text不同的位置之間不共享學習到的引數 RNN模型,可以用左邊也可

卷積神經網路Convolutional Neural Networks,CNNS/ConvNets

       卷積神經網路非常類似於普通的神經網路:它們都是由具有可以學習的權重和偏置的神經元組成。每一個神經元接收一些輸入,然後進行點積和可選的非線性運算。而整個網路仍然表示一個可微的得分函式:從原始的影象畫素對映到類得分。在最後一層(全連線層)也有損失函

卷積神經網路:Convolutional Neural Networks(CNN)

卷積神經網路是一種多層神經網路,擅長處理影象特別是大影象的相關機器學習問題。 卷積網路通過一系列方法,成功將資料量龐大的影象識別問題不斷降維,最終使其能夠被訓練。CNN最早由Yann LeCun提出並應用在手寫字型識別上(MINST)。LeCun提出的網路稱為LeNet,其網路結構如下: 這是一個最典

Stanford機器學習---第五講. 神經網路的學習 Neural Networks learning

轉載自:http://blog.csdn.net/dan1900/article/details/17787917 本欄目(Machine learning)包括單引數的線性迴歸、多引數的線性迴歸、Octave Tutorial、Logistic Regression、

深度學習之文字分類模型-前饋神經網路(Feed-Forward Neural Networks)

目錄DAN(Deep Average Network)Fasttextfasttext文字分類fasttext的n-gram模型Doc2vec DAN(Deep Average Network) MLP(Multi-Layer Perceptrons)叫做多層感知機,即由多層網路簡單堆疊而成,進而我們可以在輸

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs學習筆記

介紹-什麼是RNN 1.RNN的主要思想是利用序列資訊。 The idea behind RNNs is to make use of sequential information. In a traditional neural network we assu

吳恩達Coursera深度學習課程 deeplearning.ai (5-1) 迴圈序列模型--程式設計作業(一):構建迴圈神經網路

Part 1: 構建神經網路 歡迎來到本週的第一個作業,這個作業我們將利用numpy實現你的第一個迴圈神經網路。 迴圈神經網路(Recurrent Neural Networks: RNN) 因為有”記憶”,所以在自然語言處理(Natural Languag

吳恩達Coursera深度學習課程 DeepLearning.ai 提煉筆記5-1-- 迴圈神經網路

Ng最後一課釋出了,撒花!以下為吳恩達老師 DeepLearning.ai 課程專案中,第五部分《序列模型》第一週課程“迴圈神經網路”關鍵點的筆記。 同時我在知乎上開設了關於機器學習深度學習的專欄收錄下面的筆記,以方便大家在移動端的學習。歡迎關

DeepLearning.ai作業:(4-1)-- 卷積神經網路Foundations of CNN

title: ‘DeepLearning.ai作業:(4-1)-- 卷積神經網路(Foundations of CNN)’ id: dl-ai-4-1h tags: dl.ai homework categories: AI Deep Learning d

DeepLearning.ai作業:(1-4)-- 深層神經網路Deep neural networks

不要抄作業! 我只是把思路整理了,供個人學習。 不要抄作業! 本週的作業分了兩個部分,第一部分先構建神經網路的基本函式,第二部分才是構建出模型並預測。 Part1 構建的函式有: Initialize the parameters t