1. 程式人生 > >Gentle Introduction to Models for Sequence Prediction with Recurrent Neural Networks

Gentle Introduction to Models for Sequence Prediction with Recurrent Neural Networks

Sequence prediction is a problem that involves using historical sequence information to predict the next value or values in the sequence.

The sequence may be symbols like letters in a sentence or real values like those in a time series of prices. Sequence prediction may be easiest to understand in the context of time series forecasting as the problem is already generally understood.

In this post, you will discover the standard sequence prediction models that you can use to frame your own sequence prediction problems.

After reading this post, you will know:

  • How sequence prediction problems are modeled with recurrent neural networks.
  • The 4 standard sequence prediction models used by recurrent neural networks.
  • The 2 most common misunderstandings made by beginners when applying sequence prediction models.

Let’s get started.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

  1. Sequence Prediction with Recurrent Neural Networks
  2. Models for Sequence Prediction
  3. Cardinality from Timesteps not Features
  4. Two Common Misunderstandings by Practitioners

Sequence Prediction with Recurrent Neural Networks

Recurrent Neural Networks, like Long Short-Term Memory (LSTM) networks, are designed for sequence prediction problems.

In fact, at the time of writing, LSTMs achieve state-of-the-art results in challenging sequence prediction problems like neural machine translation (translating English to French).

LSTMs work by learning a function (f(…)) that maps input sequence values (X) onto output sequence values (y).

1 y(t) = f(X(t))

The learned mapping function is static and may be thought of as a program that takes input variables and uses internal variables. Internal variables are represented by an internal state maintained by the network and built up or accumulated over each value in the input sequence.

… RNNs combine the input vector with their state vector with a fixed (but learned) function to produce a new state vector. This can in programming terms be interpreted as running a fixed program with certain inputs and some internal variables.

The static mapping function may be defined with a different number of inputs or outputs, as we will review in the next section.

Need help with LSTMs for Sequence Prediction?

Take my free 7-day email course and discover 6 different LSTM architectures (with code).

Click to sign-up and also get a free PDF Ebook version of the course.

Models for Sequence Prediction

In this section, will review the 4 primary models for sequence prediction.

We will use the following terminology:

  • X: The input sequence value, may be delimited by a time step, e.g. X(1).
  • u: The hidden state value, may be delimited by a time step, e.g. u(1).
  • y: The output sequence value, may be delimited by a time step, e.g. y(1).

One-to-One Model

A one-to-one model produces one output value for each input value.

One-to-One Sequence Prediction Model

One-to-One Sequence Prediction Model

The internal state for the first time step is zero; from that point onward, the internal state is accumulated over the prior time steps.

One-to-One Sequence Prediction Model Over Time

One-to-One Sequence Prediction Model Over Time

In the case of a sequence prediction, this model would produce one time step forecast for each observed time step received as input.

This is a poor use for RNNs as the model has no chance to learn over input or output time steps (e.g. BPTT). If you find implementing this model for sequence prediction, you may intend to be using a many-to-one model instead.

One-to-Many Model

A one-to-many model produces multiple output values for one input value.

One-to-Many Sequence Prediction Model

One-to-Many Sequence Prediction Model

The internal state is accumulated as each value in the output sequence is produced.

This model can be used for image captioning where one image is provided as input and a sequence of words are generated as output.

Many-to-One Model

A many-to-one model produces one output value after receiving multiple input values.

Many-to-One Sequence Prediction Model

Many-to-One Sequence Prediction Model

The internal state is accumulated with each input value before a final output value is produced.

In the case of time series, this model would use a sequence of recent observations to forecast the next time step. This architecture would represent the classical autoregressive time series model.

Many-to-Many Model

A many-to-many model produces multiple outputs after receiving multiple input values.

Many-to-Many Sequence Prediction Model

Many-to-Many Sequence Prediction Model

As with the many-to-one case, state is accumulated until the first output is created, but in this case multiple time steps are output.

Importantly, the number of input time steps do not have to match the number of output time steps. Think of the input and output time steps operating at different rates.

In the case of time series forecasting, this model would use a sequence of recent observations to make a multi-step forecast.

In a sense, it combines the capabilities of the many-to-one and one-to-many models.

Cardinality from Timesteps (not Features!)

A common point of confusion is to conflate the above examples of sequence mapping models with multiple input and output features.

A sequence may be comprised of single values, one for each time step.

Alternately, a sequence could just as easily represent a vector of multiple observations at the time step. Each item in the vector for a time step may be thought of as its own separate time series. It does not affect the description of the models above.

For example, a model that takes as input one time step of temperature and pressure and predicts one time step of temperature and pressure is a one-to-one model, not a many-to-many model.

Multiple-Feature Sequence Prediction Model

Multiple-Feature Sequence Prediction Model

The model does take two values as input and predicts two values, but there is only a single sequence time step expressed for the input and predicted as output.

The cardinality of the sequence prediction models defined above refers to time steps, not features (e.g. univariate or multivariate sequences).

Two Common Misunderstandings by Practitioners

The confusion of features vs time steps leads to two main misunderstandings when implementing recurrent neural networks by practitioners:

1. Timesteps as Input Features

Observations at previous timesteps are framed as input features to the model.

This is the classical fixed-window-based approach of inputting sequence prediction problems used by multilayer Perceptrons. Instead, the sequence should be fed in one time step at a time.

This confusion may lead you to think you have implemented a many-to-one or many-to-many sequence prediction model when in fact you only have a single vector input for one time step.

2. Timesteps as Output Features

Predictions at multiple future time steps are framed as output features to the model.

This is the classical fixed-window approach of making multi-step predictions used by multilayer Perceptrons and other machine learning algorithms. Instead, the sequence predictions should be generated one time step at a time.

This confusion may lead you to think you have implemented a one-to-many or many-to-many sequence prediction model when in fact you only have a single vector output for one time step (e.g. seq2vec not seq2seq).

Note: framing timesteps as features in sequence prediction problems is a valid strategy, and could lead to improved performance even when using recurrent neural networks (try it!). The important point here is to understand the common pitfalls and not trick yourself when framing your own prediction problems.

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Summary

In this tutorial, you discovered the standard models for sequence prediction with recurrent neural networks.

Specifically, you learned:

  • How sequence prediction problems are modeled with recurrent neural networks.
  • The 4 standard sequence prediction models used by recurrent neural networks.
  • The 2 most common misunderstandings made by beginners when applying sequence prediction models.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop LSTMs for Sequence Prediction Today!

Long Short-Term Memory Networks with Python

Develop Your Own LSTM models in Minutes

…with just a few lines of python code

It provides self-study tutorials on topics like:
CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation, making predictions and much more…

Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects

Skip the Academics. Just Results.


相關推薦

Gentle Introduction to Models for Sequence Prediction with Recurrent Neural Networks

Tweet Share Share Google Plus Sequence prediction is a problem that involves using historical se

Bag of Tricks for Image Classification with Convolutional Neural Networks

Bag of Tricks for Image Classification with Convolutional Neural Networks,李沐大神18年12月的新作,用卷積神經網路進行影象分類的一些技巧。 論文:Bag of Tricks for Image Classific

【論文閱讀】Bag of Tricks for Image Classification with Convolutional Neural Networks

Bag of Tricks for Image Classification with Convolutional Neural Networks 論文:https://arxiv.org/pdf/1812.01187.pdf 本文作者總結了模型訓練過程中可以提高準確率的方法,如題,

Effective Use ofWord Order for Text Categorization with Convolutional Neural Networks(閱讀理解)

一篇公開在2014年的文章,從現在的角度來看這篇文章的話,我們發現作者提出的方法很難算是主流方法,但在當時也有一定的啟發意義。這裡我們就簡單介紹一下這篇文章。本文提出了將CNN直接應用於高維度的文字資料上,為我們提供了兩者CNN網路Seq-CNNAs a running to

Sentiment Analysis with Recurrent Neural Networks in TensorFlow 利用TensorFlow迴歸神經網路進行情感分析 Pluralsigh

Sentiment Analysis with Recurrent Neural Networks in TensorFlow 中文字幕 利用TensorFlow迴歸神經網路進行情感分析 中文字幕Sentiment Analysis with Recurrent Neural Netwo

Sentiment Analysis with Recurrent Neural Networks in TensorFlow 利用TensorFlow迴歸神經網路進行情感分析 Pluralsigh

Sentiment Analysis with Recurrent Neural Networks in TensorFlow 中文字幕 利用TensorFlow迴歸神經網路進行情感分析 中文字幕Sentiment Analysis with Recurrent

【論文筆記1】RNN在影象壓縮領域的運用——Variable Rate Image Compression with Recurrent Neural Networks

一、引言 隨著網際網路的發展,網路圖片的數量越來越多,而使用者對網頁載入的速度要求越來越高。為了滿足使用者對網頁載入快速性、舒適性的服務需求,如何將影象以更低的位元組數儲存(儲存空間的節省意味著更快的傳輸速度)並給使用者一個低解析度的thumbnails(縮圖)的previ

【論文筆記2】影象壓縮神經網路在Kodak資料集上首次超越JPEG——Full Resolution Image Compression with Recurrent Neural Networks

一、引言 這篇論文提出了一種基於神經網路的全解析度的有損影象壓縮方法,在變壓縮比的情況下無需重複訓練,所以說整個網路只需要訓練一次。論文的內容主要包括如下三個部分: (1)提出了三種影象壓縮框架,分別是基於LSTM的RNN網路、基於關聯LSTM(associative

Gentle Introduction to the Adam Optimization Algorithm for Deep Learning

The choice of optimization algorithm for your deep learning model can mean the difference between good results in minutes, hours, and days. The Adam optim

A Gentle Introduction to Transfer Learning for Deep Learning

Tweet Share Share Google Plus Transfer learning is a machine learning method where a model devel

A Gentle Introduction to Matrix Factorization for Machine Learning

Tweet Share Share Google Plus Many complex matrix operations cannot be solved efficiently or wit

A Gentle Introduction to Broadcasting with NumPy Arrays

Tweet Share Share Google Plus Arrays with different sizes cannot be added, subtracted, or genera

A Gentle Introduction to Deep Learning Caption Generation Models

Tweet Share Share Google Plus Caption generation is the challenging artificial intelligence prob

An introduction to parsing text in Haskell with Parsec

util eof try xib reporting where its ner short Parsec makes parsing text very easy in Haskell. I write this as much for myself as for any

A Gentle Introduction to Autocorrelation and Partial Autocorrelation (譯文)

A Gentle Introduction to Autocorrelation and Partial Autocorrelation 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/gentle-introdu

A Gentle Introduction to Applied Machine Learning as a Search Problem (譯文)

​ A Gentle Introduction to Applied Machine Learning as a Search Problem 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/applied-m

A gentle introduction to decision trees using R

Most techniques of predictive analytics have their origins in probability or statistical theory (see my post on Naïve Bayes, for example). In this post I'l

An introduction to IBM Cloud Log Analysis with the IBM Cloud Kubernetes Service

IBM Cloud Log Analysis is a service which plugs directly into the IBM Cloud Kubernetes Service, allowing for cluster-level log aggreg

Gentle Introduction to Predictive Modeling

Tweet Share Share Google Plus When you’re an absolute beginner it can be very confusing. Frustra

A Gentle Introduction to RNN Unrolling

Tweet Share Share Google Plus Recurrent neural networks are a type of neural network where the o