1. 程式人生 > >How to Load and Explore Time Series Data in Python

How to Load and Explore Time Series Data in Python

The Pandas library in Python provides excellent, built-in support for time series data.

Once loaded, Pandas also provides tools to explore and better understand your dataset.

In this post, you will discover how to load and explore your time series dataset.

After completing this tutorial, you will know:

  • How to load your time series dataset from a CSV file using Pandas.
  • How to peek at the loaded data and calculate summary statistics.
  • How to plot and review your time series data.

Let’s get started.

Daily Female Births Dataset

In this post, we will use the Daily Female Births Dataset as an example.

This univariate time series dataset describes the number of daily female births in California in 1959.

The units are a count and there are 365 observations. The source of the dataset is credited to Newton (1988).

Below is a sample of the first 5 rows of data, including the header row.

123456 "Date","Daily total female births in California, 1959""1959-01-01",35"1959-01-02",32"1959-01-03",30"1959-01-04",31"1959-01-05",44

Below is a plot of the entire dataset taken from Data Market.

Daily Female Births Dataset

Daily Female Births Dataset

Download the dataset and place it in your current working directory with the file name “daily-total-female-births-in-cal.csv“.

Load Time Series Data

Pandas represented time series datasets as a Series.

A Series is a one-dimensional array with a time label for each row.

We can load the Daily Female Births dataset directly using the Series class as follows:

1234 # Load birth datafrom pandas import Seriesseries=Series.from_csv('daily-total-female-births-in-cal.csv',header=0)print(series.head())

Running this example prints the first 5 rows of the dataset, as follows:

1234567 Date1959-01-01 351959-01-02 321959-01-03 301959-01-04 311959-01-05 44Name: Daily total female births in California, 1959, dtype: int64

The series has a name, which is the column name of the data column.

You can see that each row has an associated date. This is in fact not a column, but instead a time index for value. As an index, there can be multiple values for one time, and values may be spaced evenly or unevenly across times.

The main function for loading CSV data in Pandas is the read_csv() function. We can use this to load the time series as a Series object, instead of a DataFrame, as follows:

12345 # Load birth data using read_csvfrom pandas import read_csvseries=read_csv('daily-total-female-births-in-cal.csv',header=0,parse_dates=[0],index_col=0,squeeze=True)print(type(series))print(series.head())

Note the arguments to the read_csv() function.

We provide it a number of hints to ensure the data is loaded as a Series.

  • header=0: We must specify the header information at row 0.
  • parse_dates=[0]: We give the function a hint that data in the first column contains dates that need to be parsed. This argument takes a list, so we provide it a list of one element, which is the index of the first column.
  • index_col=0: We hint that the first column contains the index information for the time series.
  • squeeze=True: We hint that we only have one data column and that we are interested in a Series and not a DataFrame.

One more argument you may need to use for your own data is date_parser to specify the function to parse date-time values. In this example, the date format has been inferred, and this works in most cases. In those few cases where it does not, specify your own date parsing function and use the date_parser argument.

Running the example above prints the same output, but also confirms that the time series was indeed loaded as a Series object.

12345678 <class 'pandas.core.series.Series'>Date1959-01-01 351959-01-02 321959-01-03 301959-01-04 311959-01-05 44Name: Daily total female births in California, 1959, dtype: int64

It is often easier to perform manipulations of your time series data in a DataFrame rather than a Series object.

In those situations, you can easily convert your loaded Series to a DataFrame as follows:

1 dataframe=DataFrame(series)

Further Reading

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover data prep, modeling and more (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Exploring Time Series Data

Pandas also provides tools to explore and summarize your time series data.

In this section, we’ll take a look at a few, common operations to explore and summarize your loaded time series data.

Peek at the Data

It is a good idea to take a peek at your loaded data to confirm that the types, dates, and data loaded as you intended.

You can use the head() function to peek at the first 5 recordsĀ or specify the first n number of records to review.

For example, you can print the first 10 rows of data as follows.

123 from pandas import Seriesseries=Series.from_csv('daily-total-female-births-in-cal.csv',header=0)print(series.head(10))

Running the example prints the following:

1234567891011 Date1959-01-01 351959-01-02 321959-01-03 301959-01-04 311959-01-05 441959-01-06 291959-01-07 451959-01-08 431959-01-09 381959-01-10 27

You can also use the tail() function to get the last n records of the dataset.

Number of Observations

Another quick check to perform on your data is the number of loaded observations.

This can help flush out issues with column headers not being handled as intended, and to get an idea on how to effectively divide up data later for use with supervised learning algorithms.

You can get the dimensionality of your Series using the size parameter.

123 from pandas import Seriesseries=Series.from_csv('daily-total-female-births-in-cal.csv',header=0)print(series.size)

Running this example we can see that as we would expect, there are 365 observations, one for each day of the year in 1959.

1 365

Querying By Time

You can slice, dice, and query your series using the time index.

For example, you can access all observations in January as follows:

123 from pandas import Seriesseries=Series.from_csv('daily-total-female-births-in-cal.csv',header=0)print(series['1959-01'])

Running this displays the 31 observations for the month of January in 1959.

1234567891011121314151617181920212223242526272829303132 Date1959-01-01 351959-01-02 321959-01-03 301959-01-04 311959-01-05 441959-01-06 291959-01-07 451959-01-08 431959-01-09 381959-01-10 271959-01-11 381959-01-12 331959-01-13 551959-01-14 471959-01-15 451959-01-16 371959-01-17 501959-01-18 431959-01-19 411959-01-20 521959-01-21 341959-01-22 531959-01-23 391959-01-24 321959-01-25 371959-01-26 431959-01-27 391959-01-28 351959-01-29 441959-01-30 381959-01-31 24

This type of index-based querying can help to prepare summary statistics and plots while exploring the dataset.

Descriptive Statistics

Calculating descriptive statistics on your time series can help get an idea of the distribution and spread of values.

This may help with ideas of data scaling and even data cleaning that you can perform later as part of preparing your dataset for modeling.

The describe() function creates a 7 number summary of the loaded time series including mean, standard deviation, median, minimum, and maximum of the observations.

123 from pandas import Seriesseries=Series.from_csv('daily-total-female-births-in-cal.csv',header=0)print(series.describe())

Running this example prints a summary of the birth rate dataset.

12345678 count 365.000000mean 41.980822std 7.348257min 23.00000025% 37.00000050% 42.00000075% 46.000000max 73.000000

Plotting Time Series

Plotting time series data, especially univariate time series, is an important part of exploring your data.

This functionality is provided on the loaded Series by calling the plot() function.

Below is an example of plotting the entire loaded time series dataset.

12345 from pandas import Seriesfrom matplotlib import pyplotseries=Series.from_csv('daily-total-female-births-in-cal.csv',header=0)pyplot.plot(series)pyplot.show()

Running the example creates a time series plot with the number of daily births on the y-axis and time in days along the x-axis.

Daily Total Female Births Plot

Daily Total Female Births Plot

Further Reading

If you’re interested in learning more about Pandas’ functionality working with time series data, see some of the links below.

Summary

In this post, you discovered how to load and handle time series data using the Pandas Python library.

Specifically, you learned:

  • How to load your time series data as a Pandas Series.
  • How to peek at and calculate summary statistics of your time series data.
  • How to plot your time series data.

Do you have any questions about handling time series data in Python, or about this post?
Ask your questions in the comments below and I will do my best to answer.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

相關推薦

How to Load and Explore Time Series Data in Python

Tweet Share Share Google Plus The Pandas library in Python provides excellent, built-in support

How to Create an ARIMA Model for Time Series Forecasting in Python

Tweet Share Share Google Plus A popular and widely used statistical method for time series forec

Failed to load GpuProgram from binary shader data in 'XXXXXX'.的解決方法

在開發的過程中,執行專案出現了Failed to load GpuProgram from binary shader data in 'XXXXX'.的警告。經過排查發現是由於自己的資源在製作的時候採

[iOS] How to get the current time as datetime in swift

let date = Date() let calendar = Calendar.current let hour = calendar.component(.hour, from: date) let minutes = calendar.component(.minute, from: date

Analyzing time series data in Pandas

Analyzing time series data in PandasIn my previous tutorials, we have considered data preparation and visualization tools such as Numpy, Pandas, Matplotlib

Ask HN: How to implement caching for dynamic user data in sites like HN, Reddit?

Why would you start by caching it?What are you storing the data in currently? If relational, I'd advise starting with simple relational tables (post_commen

How to Better Understand Your Machine Learning Data in Weka

Tweet Share Share Google Plus It is important to take your time to learn about your data when st

Subclassed: how to load initial data and test data in Django 2+

There are two ways to automatically load data in Django: for data you need while running tests, place xml/json/yaml files in yourapp/fixtures. for data

How to SUM and GROUP BY of JSON data?

How to SUM and GROUP BY of JSON data? Source: StackOverflow.com Question Some server-side code actually generates a JSON formatted stri

How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)

How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/

8-------Short-term Electricity Load Forecasting using Time Series and Ensemble Learning Methods

就是四種方法+殘差分析   討論了四種不同的方法。並進行了比較,即季節自迴歸滑動平均(SARIMA)與EXOGE-季節性自迴歸滑動平均 隨機變數(SARIMAX)、隨機森林(RF)和梯度提升迴歸樹(GBRT)。預測效能每個模型通過兩個度量來評估,即平均絕對值。百分比誤差(MAP

How to automatically segment customers using purchase data and a few lines of Python

How to automatically segment customers using purchase data and a few lines of PythonA small educative project for learning “Customer Segmentation” with a s

How to Normalize and Standardize Your Machine Learning Data in Weka

Tweet Share Share Google Plus Machine learning algorithms make assumptions about the dataset you

How To Load Your Machine Learning Data Into R

Tweet Share Share Google Plus You need to be able to load data into R when working on a machine

How to Model Residual Errors to Correct Time Series Forecasts with Python

Tweet Share Share Google Plus The residual errors from forecasts on a time series provide anothe

How to Load Data in Python with Scikit

Tweet Share Share Google Plus Before you can build machine learning models, you need to load you

<轉>How to Encourage Your Child's Interest in Science and Tech

sim challenge table nic options https fun developed advice How to Encourage Your Child‘s Interest in Science and Tech This week’s Ask-A-D

Livemedia-creator- How to create and use a Live CD

download further burning method create Livemedia-creator- How to create and use a Live CDNote for older method (namely for Fedora 23) using livec

[Python] How to unpack and pack collection in Python?

ide ont add off art video lec ref show It is a pity that i can not add the video here. As a result, i offer the link as below: How to

3.1.7. Cross validation of time series data

distrib per ted sklearn provided imp depend util ech 3.1.7. Cross validation of time series data Time series data is characterised by the