1. 程式人生 > >10 Challenging Machine Learning Time Series Forecasting Problems

10 Challenging Machine Learning Time Series Forecasting Problems

Machine learning methods have a lot to offer for time series forecasting problems.

A difficulty is that most methods are demonstrated on simple univariate time series forecasting problems.

In this post, you will discover a suite of challenging time series forecasting problems. These are problems where classical linear statistical methods will not be sufficient and where more advanced machine learning methods are required.

If you are looking for challenging time series datasets to practice machine learning techniques, you are in the right place.

Let’s dive in.

Challenging Machine Learning Time Series Forecasting Problems

Challenging Machine Learning Time Series Forecasting Problems
Photo by Joao Trindade, some rights reserved.

Overview

We will take a closer look at 10 challenging time series datasets from the competitive data science website

Kaggle.com.

Not all datasets are strict time series prediction problems; I have been loose in the definition and also included problems that were a time series before obfuscation or have a clear temporal component.

They are:

  • How Much Did It Rain? I and II
  • Online Product Sales
  • Rossmann Store Sales
  • Walmart Recruiting – Store Sales Forecasting
  • Acquire Valued Shoppers Challenge
  • Melbourne University AES/MathWorks/NIH Seizure Prediction
  • AMS 2013-2014 Solar Energy Prediction Contest
  • Global Energy Forecasting Competition 2012 – Wind Forecasting
  • EMC Data Science Global Hackathon (Air Quality Prediction)
  • Grupo Bimbo Inventory Demand

This is not all of the time series datasets hosted on Kaggle.
Did I miss a good one? Let me know in the comments below.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover data prep, modeling and more (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

How Much Did It Rain? I and II

Given observations and derived measures from polarimetric radar, the problem is to predict the probability distribution of the hourly total in a rain gage.

The temporal structure (e.g. hour to hour) was removed as part of obfuscating the data, which would have made it an interesting time series problem.

The competition was run twice in the same year with different datasets:

The second competition was won by Aaron Sim, who used a very large recurrent neural network algorithm.

Online Product Sales

Given details of the product and the product launch, the problem is to predict the next 12 months of sales figures.

This is a multi-step forecast, or sequence forecast, without a history of sales from which to extrapolate.

I could not find any good write-ups of top performing solutions. Can you?

Rossmann Store Sales

Given historical daily sales for more than one thousands stores, the problem is to predict 6 weeks of daily sales figures for each store.

This provides both an opportunity to explore store-wise multi-step forecasts, as well as the ability to exploit cross-store patterns.

Top results were achieved with careful feature engineering and the use of gradient boosting.

Walmart Recruiting – Store Sales Forecasting

Given historical weekly sales data for multiple departments in multiple stores, as well as details of promotions, the problem is to predict sales figures for store departments.

This provides both an opportunity to explore department-wise and even store-wise forecasts, as well as the ability to exploit cross-department and cross-store patterns.

Top performers made heavy use of ARIMA models and careful handling of public holidays.

Acquire Valued Shoppers Challenge

Given historical shopping behavior, the problem is to predict which customers will likely repeat purchase (become acquired) after taking up a discount offer.

The large number of transactions make this a big data download, nearly 3 gigabytes.

The problem provides an opportunity to model the time series of specific or aggregated customers and predict the probability of customer conversion.

I could not find any good write-ups of top performing solutions. Can you?

Melbourne University AES/MathWorks/NIH Seizure Prediction

Given a trace of human brain activity observed with an intracranial EEG for months or years, the problem is to predict whether 10-minute segments indicate the probability of a seizure or not.

A 4th place solution is described that made use of statistical feature engineering and gradient boosting.

Update: The dataset has since been taken down.

AMS 2013-2014 Solar Energy Prediction Contest

Given historical meteorological forecasts at multiple sites, the problem is to predict the total daily solar energy at each site for one year.

The dataset provides an opportunity to model spatial and temporal time series by site and across sites and make multi-step forecasts for each site.

Global Energy Forecasting Competition 2012 – Wind Forecasting

Given historical wind forecasts and power generation at multiple sites, the problem is to predict hourly power generation for the next 48 hours.

The dataset provides an opportunity to model the hourly time series for individual sites as well as across-sites.

I could not find any good write-ups of top performing solutions. Can you?

EMC Data Science Global Hackathon (Air Quality Prediction)

Given eight days of hourly measurements of air pollutants, the problem is to forecast pollutants at specific times over the following three days.

The dataset provides an opportunity to model a multivariate time series and perform a multi-step forecast.

A good write-up of the top performing solution describes the use of an ensemble of random forest models trained on lagged variables.

Summary

In this post, you discovered a suite of challenging time series forecasting problems.

These are problems that provided the foundation for competitive machine learning on the site Kaggle.com. As such, each problem also provides a great source of discussion and existing world-class solutions that can be used as inspiration and a starting point.

If you are interested in better understanding the role of machine learning for time series forecasting, I would recommend selecting one or more of these problems as a starting point.

Have you worked on one or more of these problems?
Share your experiences in the comments below.

Is there a time series problem on Kaggle.com that was not mentioned in this post?
Let me know about it in the comments below.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

相關推薦

10 Challenging Machine Learning Time Series Forecasting Problems

Tweet Share Share Google Plus Machine learning methods have a lot to offer for time series forec

step Time Series Forecasting with Machine Learning for Household Electricity Consumption

Given the rise of smart electricity meters and the wide adoption of electricity generation technology like solar panels, there is a wealth of electricity

How to Get Good Results Fast with Deep Learning for Time Series Forecasting

Tweet Share Share Google Plus 3 Strategies to Design Experiments and Manage Complexity on Your P

Multivariate Time Series Forecasting with LSTMs in Keras 中文版翻譯

像長期短期記憶(LSTM)神經網路的神經網路能夠模擬多個輸入變數的問題。這在時間序列預測中是一個很大的益處,其中古典線性方法難以適應多變數或多輸入預測問題。 在本教程中,您將發現如何在Keras深度學習庫中開發多變數時間序列預測的LSTM模型。 完成本教程後,您將知道: 如何

Time series Forecasting — ARIMA models

Time series Forecasting — ARIMA modelsARIMA stands for Auto Regressive Integrated Moving Average. There are seasonal and Non-seasonal ARIMA models that can

How AI, Machine Learning Are Solving Global Problems

Although developments in the field of artificial intelligence began around the 1950s, its capacities have significantly increased in the recent years. Owin

How to Create an ARIMA Model for Time Series Forecasting in Python

Tweet Share Share Google Plus A popular and widely used statistical method for time series forec

LSTM Model Architecture for Rare Event Time Series Forecasting

Tweet Share Share Google Plus Time series forecasting with LSTMs directly has shown little succe

愉快的學習就從翻譯開始吧_Multivariate Time Series Forecasting with LSTMs in Keras_3_Multivariate LSTM Forecast

3. Multivariate LSTM Forecast Model/多變數LSTM預測模型In this section, we will fit an LSTM to the problem.本章,我們將一個LSTM擬合到這個問題LSTM Data Preparatio

Introduction to Time Series Forecasting With Python

I believe my books offer thousands of dollars of education for tens of dollars each. They are months if not years of experience distilled into a few hundre

5 Top Books on Time Series Forecasting With R

Tweet Share Share Google Plus Time series forecasting is a difficult problem. Unlike classificat

Feature Selection for Time Series Forecasting with Python

Tweet Share Share Google Plus The use of machine learning methods on time series data requires f

What Is Time Series Forecasting?

Tweet Share Share Google Plus Time series forecasting is an important area of machine learning t

Simple Time Series Forecasting Models to Test So That You Don't Fool Yourself

Tweet Share Share Google Plus It is important to establish a strong baseline of performance on a

Multivariate Time Series Forecasting with LSTMs in Keras

Tweet Share Share Google Plus Neural networks like Long Short-Term Memory (LSTM) recurrent neura

8-------Short-term Electricity Load Forecasting using Time Series and Ensemble Learning Methods

就是四種方法+殘差分析   討論了四種不同的方法。並進行了比較,即季節自迴歸滑動平均(SARIMA)與EXOGE-季節性自迴歸滑動平均 隨機變數(SARIMAX)、隨機森林(RF)和梯度提升迴歸樹(GBRT)。預測效能每個模型通過兩個度量來評估,即平均絕對值。百分比誤差(MAP

Machine Learning with Time Series Data

As with any data science problem, exploring the data is the most important process before stating a solution. The dataset collected had data on Chicago wea

7 Time Series Datasets for Machine Learning

Tweet Share Share Google Plus Machine learning can be applied to time series datasets. These are

Coursera - Machine Learning, Stanford: Week 10

minimal machine mini ica dataset pri text -c summary Overview Gradient Descent with Large Datasets Learning With Large Datasets

【原】Coursera—Andrew Ng機器學習—課程筆記 Lecture 10—Advice for applying machine learning

Lecture 10—Advice for applying machine learning   10.1 如何除錯一個機器學習演算法? 有多種方案: 1、獲得更多訓練資料;2、嘗試更少特徵;3、嘗試更多特徵;4、嘗試新增多項式特徵;5、減小 λ;6、增大 λ 為了避免一個方案一個方