1. 程式人生 > >How to Use Power Transforms for Time Series Forecast Data with Python

How to Use Power Transforms for Time Series Forecast Data with Python

Data transforms are intended to remove noise and improve the signal in time series forecasting.

It can be very difficult to select a good, or even best, transform for a given prediction problem. There are many transforms to choose from and each has a different mathematical intuition.

In this tutorial, you will discover how to explore different power-based transforms for time series forecasting with Python.

After completing this tutorial, you will know:

  • How to identify when to use and how to explore a square root transform.
  • How to identify when to use and explore a log transform and the expectations on raw data.
  • How to use the Box-Cox transform to perform square root, log, and automatically discover the best power transform for your dataset.

Let’s get started.

Airline Passengers Dataset

The Airline Passengers dataset describes a total number of airline passengers over time.

The units are a count of the number of airline passengers in thousands. There are 144 monthly observations from 1949 to 1960.

Download the dataset to your current working directory with the filename “airline-passengers.csv“.

The example below loads the dataset and plots the data.

1234567891011 from pandas import Seriesfrom matplotlib import pyplotseries=Series.from_csv('airline-passengers.csv',header=0)pyplot.figure(1)# line plotpyplot.subplot(211)pyplot.plot(series)# histogrampyplot.subplot(212)pyplot.hist(series)pyplot.show()

Running the example creates two plots, the first showing the time series as a line plot and the second showing the observations as a histogram.

Airline Passengers Dataset Plot

Airline Passengers Dataset Plot

The dataset is non-stationary, meaning that the mean and the variance of the observations change over time. This makes it difficult to model by both classical statistical methods, like ARIMA, and more sophisticated machine learning methods, like neural networks.

This is caused by what appears to be both an increasing trend and a seasonality component.

In addition, the amount of change, or the variance, is increasing with time. This is clear when you look at the size of the seasonal component and notice that from one cycle to the next, the amplitude (from bottom to top of the cycle) is increasing.

In this tutorial, we will investigate transforms that we can use on time series datasets that exhibit this property.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover data prep, modeling and more (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Square Root Transform

A time series that has a quadratic growth trend can be made linear by taking the square root.

Let’s demonstrate this with a quick contrived example.

Consider a series of the numbers 1 to 99 squared. The line plot of this series will show a quadratic growth trend and a histogram of the values will show an exponential distribution with a long trail.

The snippet of code below creates and graphs this series.

12345678 from matplotlib import pyplotseries=[i**2foriinrange(1,100)]# line plotpyplot.plot(series)pyplot.show()# histogrampyplot.hist(series)pyplot.show()

Running the example plots the series both as a line plot over time and a histogram of observations.

Quadratic Time Series

Quadratic Time Series

If you see a structure like this in your own time series, you may have a quadratic growth trend. This can be removed or made linear by taking the inverse operation of the squaring procedure, which is the square root.

Because the example is perfectly quadratic, we would expect the line plot of the transformed data to show a straight line. Because the source of the squared series is linear, we would expect the histogram to show a uniform distribution.

The example below performs a sqrt() transform on the time series and plots the result.

12345678910111213 from matplotlib import pyplotfrom numpy import sqrtseries=[i**2foriinrange(1,100)]# sqrt transformtransform=series=sqrt(series)pyplot.figure(1)# line plotpyplot.subplot(211)pyplot.plot(transform)# histogrampyplot.subplot(212)pyplot.hist(transform)pyplot.show()

We can see that, as expected, the quadratic trend was made linear.

Square Root Transform of Quadratic Time Series

Square Root Transform of Quadratic Time Series

It is possible that the Airline Passengers dataset shows a quadratic growth. If this is the case, then we could expect a square root transform to reduce the growth trend to be linear and change the distribution of observations to be perhaps nearly Gaussian.

The example below performs a square root of the dataset and plots the results.

12345678910111213141516 from pandas import Seriesfrom pandas import DataFramefrom numpy import sqrtfrom matplotlib import pyplotseries=Series.from_csv('airline-passengers.csv',header=0)dataframe=DataFrame(series.values)dataframe.columns=['passengers']dataframe['passengers']=sqrt(dataframe['passengers'])pyplot.figure(1)# line plotpyplot.subplot(211)pyplot.plot(dataframe['passengers'])# histogrampyplot.subplot(212)pyplot.hist(dataframe['passengers'])pyplot.show()

We can see that the trend was reduced, but was not removed.

The line plot still shows an increasing variance from cycle to cycle. The histogram still shows a long tail to the right of the distribution, suggesting an exponential or long-tail distribution.

Square Root Transform of Airline Passengers Dataset Plot

Square Root Transform of Airline Passengers Dataset Plot

Log Transform

A class of more extreme trends are exponential, often graphed as a hockey stick.

Time series with an exponential distribution can be made linear by taking the logarithm of the values. This is called a log transform.

As with the square and square root case above, we can demonstrate this with a quick example.

The code below creates an exponential distribution by raising the numbers from 1 to 99 to the value e, which is the base of the natural logarithms or Euler’s number (2.718…).

1234567891011 from matplotlib import pyplotfrom math import expseries=[exp(i)foriinrange(1,100)]pyplot.figure(1)# line plotpyplot.subplot(211)pyplot.plot(series)# histogrampyplot.subplot(212)pyplot.hist(series)pyplot.show()

Running the example creates a line plot of the series and a histogram of the distribution of observations.

We see an extreme increase on the line graph and an equally extreme long tail distribution on the histogram.

Exponential Time Series

Exponential Time Series

Again, we can transform this series back to linear by taking the natural logarithm of the values.

This would make the series linear and the distribution uniform. The example below demonstrates this for completeness.

12345678910111213 from matplotlib import pyplotfrom math import expfrom numpy import logseries=[exp(i)foriinrange(1,100)]transform=log(series)pyplot.figure(1)# line plotpyplot.subplot(211)pyplot.plot(transform)# histogrampyplot.subplot(212)pyplot.hist(transform)pyplot.show()

Running the example creates plots, showing the expected linear result.

Log Transformed Exponential Time Series

Log Transformed Exponential Time Series

Our Airline Passengers dataset has a distribution of this form, but perhaps not this extreme.

The example below demonstrates a log transform of the Airline Passengers dataset.

12345678910111213141516 from pandas import Seriesfrom pandas import DataFramefrom numpy import logfrom matplotlib import pyplotseries=Series.from_csv('airline-passengers.csv',header=0)dataframe=DataFrame(series.values)dataframe.columns=['passengers']dataframe['passengers']=log(dataframe['passengers'])pyplot.figure(1)# line plotpyplot.subplot(211)pyplot.plot(dataframe['passengers'])# histogrampyplot.subplot(212)pyplot.hist(dataframe['passengers'])pyplot.show()

Running the example results in a trend that does look a lot more linear than the square root transform above. The line plot shows a seemingly linear growth and variance.

The histogram also shows a more uniform or squashed Gaussian-like distribution of observations.

Log Transform of Airline Passengers Dataset Plot

Log Transform of Airline Passengers Dataset Plot

Log transforms are popular with time series data as they are effective at removing exponential variance.

It is important to note that this operation assumes values are positive and non-zero. It is common to transform observations by adding a fixed constant to ensure all input values meet this requirement. For example:

1 transform = log(constant + x)

Where transform is the transformed series, constant is a fixed value that lifts all observations above zero, and x is the time series.

Box-Cox Transform

The square root transform and log transform belong to a class of transforms called power transforms.

The Box-Cox transform is a configurable data transform method that supports both square root and log transform, as well as a suite of related transforms.

More than that, it can be configured to evaluate a suite of transforms automatically and select a best fit. It can be thought of as a power tool to iron out power-based change in your time series. The resulting series may be more linear and the resulting distribution more Gaussian or Uniform, depending on the underlying process that generated it.

The scipy.stats library provides an implementation of the Box-Cox transform. The boxcox() function takes an argument, called lambda, that controls the type of transform to perform.

Below are some common values for lambda

  • lambda = -1. is a reciprocal transform.
  • lambda = -0.5 is a reciprocal square root transform.
  • lambda = 0.0 is a log transform.
  • lambda = 0.5 is a square root transform.
  • lambda = 1.0 is no transform.

For example, we can perform a log transform using the boxcox() function as follows:

12345678910111213141516 from pandas import Seriesfrom pandas import DataFramefrom scipy.stats import boxcoxfrom matplotlib import pyplotseries=Series.from_csv('airline-passengers.csv',header=0)dataframe=DataFrame(series.values)dataframe.columns=['passengers']dataframe['passengers']=boxcox(dataframe['passengers'],lmbda=0.0)pyplot.figure(1)# line plotpyplot.subplot(211)pyplot.plot(dataframe['passengers'])

相關推薦

How to Use Power Transforms for Time Series Forecast Data with Python

Tweet Share Share Google Plus Data transforms are intended to remove noise and improve the signa

Time Series Forecast Study with Python: Monthly Sales of French Champagne

Tweet Share Share Google Plus Time series forecasting is a process, and the only way to get good

Why Use K-Means for Time Series Data? (Part One)

As an only child, I spent a lot of time by myself. Oftentimes my only respite from the extreme boredom of being by myself was daydreaming. I would meditate

How to use Different Batch Sizes when Training and Predicting with LSTMs

Tweet Share Share Google Plus Keras uses fast symbolic mathematical libraries as a backend, such

How to Use an SSL Certificate on ACM or IAM with CloudFront

{ "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Action": [ "acm:ListCertificates",

How to Create an ARIMA Model for Time Series Forecasting in Python

Tweet Share Share Google Plus A popular and widely used statistical method for time series forec

How to Get Good Results Fast with Deep Learning for Time Series Forecasting

Tweet Share Share Google Plus 3 Strategies to Design Experiments and Manage Complexity on Your P

How to use for ASP.NET Core with csproj

2017-10-10 23:40:29.5143||DEBUG|ASP.NET_Core_2___VS2017.Program|init main 2017-10-10 23:40:30.9739|0|INFO|Microsoft.AspNetCore.DataProtection.KeyManageme

How to Use the Facebook Budget Optimization Tool for Improved Results : Social Media Examiner

Wondering how to allocate your budget to reach the most effective Facebook audiences? Facebook's Budget Optimization tool uses an algorithm to automaticall

How to Use Netflix’s Eureka and Spring Cloud for Service Registry

How to Use Netflix’s Eureka and Spring Cloud for Service RegistryOne of the main tenets of the microservice architecture pattern is that a set of loosely-c

Help! My company doesn't know how to use Git for production ready releases

My company is still widely using CVCS (Central Version Control Systems) tools mostly SVN. We've just now been slowly integrating Git into our department.Ou

How to use DeepLab in TensorFlow for object segmentation using Deep Learning

How to use DeepLab in TensorFlow for object segmentation using Deep LearningModifying the DeepLab code to train on your own dataset for object segmentation

How to use Python on microcontrollers for Blockchain and IoT applications

This tutorial will be exploring the potential of combining IoT and blockchain using simple Python directly on microcontrollers, thanks to Zerynth t

How to Use the Keras Functional API for Deep Learning

Tweet Share Share Google Plus The Keras Python library makes creating deep learning models fast

How to Visualize Time Series Residual Forecast Errors with Python

Tweet Share Share Google Plus Forecast errors on time series regression problems are called resi

How to Use Metrics for Deep Learning with Keras in Python

Tweet Share Share Google Plus The Keras library provides a way to calculate and report on a suit

how to use Eclipse for kernel development

lte reporting ctu header adding point tab ros global http://wiki.eclipse.org/HowTo_use_the_CDT_to_navigate_Linux_kernel_source Here are s

How to use udev for Oracle ASM in Oracle Linux 6

但是在OEL6或者RHEL6中,這一切都有所變化。 主要的變化是: 1. scsi_id的命令語法發生了變化,scsi_id -g -u -s這樣的命令不再有效。 2. udevtest命令已經沒有了,整合到了udevadm中。 可以參考Redhat的官方文件(這個文件中本身有一些錯誤,在ud

How to Change Default Location for Outlook Data File (PST & OST)

note right folder dialog https error data locate http Is there a way to change the default location of new .pst file when create a new e-

How to use this image - Redis

art compile clu contain ext nal nds iat pop link - https://store.docker.com/images/redis?tab=description start a redis instance $ docker