1. 程式人生 > >A Gentle Introduction to RNN Unrolling

A Gentle Introduction to RNN Unrolling

Recurrent neural networks are a type of neural network where the outputs from previous time steps are fed as input to the current time step.

This creates a network graph or circuit diagram with cycles, which can make it difficult to understand how information moves through the network.

In this post, you will discover the concept of unrolling or unfolding recurrent neural networks.

After reading this post, you will know:

  • The standard conception of recurrent neural networks with cyclic connections.
  • The concept of unrolling of the forward pass when the network is copied for each input time step.
  • The concept of unrolling of the backward pass for updating network weights during training.

Let’s get started.

Unrolling Recurrent Neural Networks

Recurrent neural networks are a type of neural network where outputs from previous time steps are taken as inputs for the current time step.

We can demonstrate this with a picture.

Below we can see that the network takes both the output of the network from the previous time step as input and uses the internal state from the previous time step as a starting point for the current time step.

Example of an RNN with a cycle

Example of an RNN with a cycle

RNNs are fit and make predictions over many time steps. We can simplify the model by unfolding or unrolling the RNN graph over the input sequence.

A useful way to visualise RNNs is to consider the update graph formed by ‘unfolding’ the network along the input sequence.

Unrolling the Forward Pass

Consider the case where we have multiple time steps of input (X(t), X(t+1), …), multiple time steps of internal state (u(t), u(t+1), …), and multiple time steps of outputs (y(t), y(t+1), …).

We can unfold the above network schematic into a graph without any cycles.

Example of Unrolled RNN on the forward pass

Example of Unrolled RNN on the forward pass

We can see that the cycle is removed and that the output (y(t)) and internal state (u(t)) from the previous time step are passed on to the network as inputs for processing the next time step.

Key in this conceptualization is that the network (RNN) does not change between the unfolded time steps. Specifically, the same weights are used for each time step and it is only the outputs and the internal states that differ.

In this way, it is as though the whole network (topology and weights) are copied for each time step in the input sequence.

Further, each copy of the network may be thought of as an additional layer of the same feed forward neural network.

Example of Unrolled RNN with each copy of the network as a layer

Example of Unrolled RNN with each copy of the network as a layer

RNNs, once unfolded in time, can be seen as very deep feedforward networks in which all the layers share the same weights.

This is a useful conceptual tool and visualization to help in understanding what is going on in the network during the forward pass. It may or may not also be the way that the network is implemented by the deep learning library.

Unrolling the Backward Pass

The idea of network unfolding plays a bigger part in the way recurrent neural networks are implemented for the backward pass.

As is standard with [backpropagation through time] , the network is unfolded over time, so that connections arriving at layers are viewed as coming from the previous timestep.

Importantly, the backpropagation of error for a given time step depends on the activation of the network at the prior time step.

In this way, the backward pass requires the conceptualization of unfolding the network.

Error is propagated back to the first input time step of the sequence so that the error gradient can be calculated and the weights of the network can be updated.

Like standard backpropagation, [backpropagation through time] consists of a repeated application of the chain rule. The subtlety is that, for recurrent networks, the loss function depends on the activation of the hidden layer not only through its influence on the output layer, but also through its influence on the hidden layer at the next timestep.

Unfolding the recurrent network graph also introduces additional concerns. Each time step requires a new copy of the network, which in turn takes up memory, especially for larger networks with thousands or millions of weights. The memory requirements of training large recurrent networks can quickly balloon as the number of time steps climbs into the hundreds.

… it is required to unroll the RNNs by the length of the input sequence. By unrolling an RNN N times, every activations of the neurons inside the network are replicated N times, which consumes a huge amount of memory especially when the sequence is very long. This hinders a small footprint implementation of online learning or adaptation. Also, this “full unrolling” makes a parallel training with multiple sequences inefficient on shared memory models such as graphics processing units (GPUs)

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Papers

Articles

Summary

In this tutorial, you discovered the visualization and conceptual tool of unrolling recurrent neural networks.

Specifically, you learned:

  • The standard conception of recurrent neural networks with cyclic connections.
  • The concept of unrolling of the forward pass when the network is copied for each input time step.
  • The concept of unrolling of the backward pass for updating network weights during training.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop LSTMs for Sequence Prediction Today!

Long Short-Term Memory Networks with Python

Develop Your Own LSTM models in Minutes

…with just a few lines of python code

It provides self-study tutorials on topics like:
CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation, making predictions and much more…

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects

Skip the Academics. Just Results.


相關推薦

A Gentle Introduction to RNN Unrolling

Tweet Share Share Google Plus Recurrent neural networks are a type of neural network where the o

A Gentle Introduction to Autocorrelation and Partial Autocorrelation (譯文)

A Gentle Introduction to Autocorrelation and Partial Autocorrelation 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/gentle-introdu

A Gentle Introduction to Applied Machine Learning as a Search Problem (譯文)

​ A Gentle Introduction to Applied Machine Learning as a Search Problem 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/applied-m

A gentle introduction to decision trees using R

Most techniques of predictive analytics have their origins in probability or statistical theory (see my post on Naïve Bayes, for example). In this post I'l

A Gentle Introduction to Transfer Learning for Deep Learning

Tweet Share Share Google Plus Transfer learning is a machine learning method where a model devel

A Gentle Introduction to Matrix Factorization for Machine Learning

Tweet Share Share Google Plus Many complex matrix operations cannot be solved efficiently or wit

A Gentle Introduction to Autocorrelation and Partial Autocorrelation

Tweet Share Share Google Plus Autocorrelation and partial autocorrelation plots are heavily used

A Gentle Introduction to Exploding Gradients in Neural Networks

Tweet Share Share Google Plus Exploding gradients are a problem where large error gradients accu

A Gentle Introduction to Broadcasting with NumPy Arrays

Tweet Share Share Google Plus Arrays with different sizes cannot be added, subtracted, or genera

A Gentle Introduction to Deep Learning Caption Generation Models

Tweet Share Share Google Plus Caption generation is the challenging artificial intelligence prob

翻譯 COMMON LISP: A Gentle Introduction to Symbolic Computation

因為學習COMMON LISP,起步較為艱難,田春翻譯的那本書起點太高,而大多數書籍起點都很高。其實國內還有一本書,是Common LISP程式設計/韓俊剛,殷勇編著,由西安電子科技大學出版社出版,不過鑑於該書已經絕版,我決定還是找個英文版的練練手。 很多高手,比如田春,都

A Short Introduction to Boosting

short why clu rom ons ner boosting algorithm plain http://www.site.uottawa.ca/~stan/csi5387/boost-tut-ppr.pdf Boosting is a general m

PBRT_V2 總結記錄 A Quick Introduction to Monte Carlo Methods

參考 : https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-mathematical-foundations   Mont

Gentle Introduction to the Adam Optimization Algorithm for Deep Learning

The choice of optimization algorithm for your deep learning model can mean the difference between good results in minutes, hours, and days. The Adam optim

A brief introduction to reinforcement learning

In this article, we'll discuss: Let's start the explanation with an example -- say there is a small baby who starts learning how to walk. Let's divide thi

Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using…

Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using Python)Have you ever been inside a well-maintained library

A Quick Introduction to Neural Arithmetic Logic Units

A Quick Introduction to Neural Arithmetic Logic Units(Credit: aitoff)Classical neural networks are extremely flexible, but there are certain tasks they are

A Quick Introduction to Text Summarization in Machine Learning

A Quick Introduction to Text Summarization in Machine LearningText summarization refers to the technique of shortening long pieces of text. The intention i

Gentle Introduction to Predictive Modeling

Tweet Share Share Google Plus When you’re an absolute beginner it can be very confusing. Frustra

Gentle Introduction to Models for Sequence Prediction with Recurrent Neural Networks

Tweet Share Share Google Plus Sequence prediction is a problem that involves using historical se