1. 程式人生 > >Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs學習筆記

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs學習筆記

介紹-什麼是RNN

1.RNN的主要思想是利用序列資訊。 The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. 2.RNN的輸出依賴前面的輸入,也可以理解為一種記憶功能。 RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. 3.RNN的記憶步長不會很長。 In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps (more on this later) 典型的RNN模型 在這裡插入圖片描述

The above diagram shows a RNN being unrolled (or unfolded) into a full network. By unrolling we simply mean that we write out the network for the complete sequence. For example, if the sequence we care about is a sentence of 5 words, the network would be unrolled into a 5-layer neural network, one layer for each word. The formulas that govern the computation happening in a RNN are as follows:

x_t is the input at time step t. For example, x_1 could be a one-hot vector corresponding to the second word of a sentence. s_t is the hidden state at time step t. It’s the “memory” of the network. s_t is calculated based on the previous hidden state and the input at the current step: s_t=f(Ux_t + Ws_{t-1}). The function f usually is a nonlinearity such as tanh or ReLU. s_{-1}, which is required to calculate the first hidden state, is typically initialized to all zeroes. o_t is the output at step t. For example, if we wanted to predict the next word in a sentence it would be a vector of probabilities across our vocabulary. o_t = \mathrm{softmax}(Vs_t).

關於這個圖還有幾點注意: 1.s_t可以認為是記憶單元

2.UVW引數共享 Unlike a traditional deep neural network, which uses different parameters at each layer, a RNN shares the same parameters (U, V, W above) across all steps. This reflects the fact that we are performing the same task at each step(RNN把序列中的每一個元素樣本都平等看待,當作同樣的任務在處理), just with different inputs. This greatly reduces the total number of parameters we need to learn.

3.RNN的每一個輸出都可以作為一種預測結果 The above diagram has outputs at each time step(RNN中每一步都可以作出next step預測), but depending on the task this may not be necessary. For example, when predicting the sentiment of a sentence we may only care about the final output, not the sentiment after each word. Similarly, we may not need inputs at each time step. The main feature of an RNN is its hidden state, which captures some information about a sequence.

RNN的結構能展示的另一個特性是,RNN可以處理不定長的序列,相比於CNN在這點上還是很有優勢的。