時間序列--上取樣、下采樣

阿新 • • 發佈：2018-12-24

在上取樣的情況下，可能需要注意如何使用插值來計算細粒度的觀測值

在向下取樣的情況下，在選擇用於計算新聚合值的彙總統計資訊時可能需要小心。

也許有兩個主要原因讓你對重新取樣你的時間序列資料感興趣:

1.問題框架:如果您的資料與您希望進行預測的頻率相同，則可能需要重新取樣。

2.特徵工程:重取樣還可以用於為監督學習模型提供額外的結構或洞察學習問題。

這兩種情況有很多重合之處。例如，您可能有每日資料，並希望預測每月的問題。您可以直接使用每日資料，也可以將其下采樣為每月資料，並開發您的模型。

https://machinelearningmastery.com/resample-interpolate-time-series-data-python/

上取樣：

from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot

def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
print(series.head())
series.plot()
pyplot.show()

資料如下：

Month
1901-01-01 266.0
1901-02-01 145.9
1901-03-01 183.1
1901-04-01 119.3
1901-05-01 180.3
Name: Sales of shampoo over a three year period, dtype: float64

也就是我們現在有月度的資料，想變成日度的資料

首先進行格式轉換

from pandas import read_csv
from pandas import datetime

def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
upsampled = series.resample('D')
print(upsampled.head(32))

這裡D代表day，搞完之後變這樣

Month
1901-01-01 266.0
1901-01-02 NaN
1901-01-03 NaN
1901-01-04 NaN
1901-01-05 NaN
1901-01-06 NaN
1901-01-07 NaN
1901-01-08 NaN
1901-01-09 NaN
1901-01-10 NaN
1901-01-11 NaN
1901-01-12 NaN
1901-01-13 NaN
1901-01-14 NaN
1901-01-15 NaN
1901-01-16 NaN
1901-01-17 NaN
1901-01-18 NaN
1901-01-19 NaN
1901-01-20 NaN
1901-01-21 NaN
1901-01-22 NaN
1901-01-23 NaN
1901-01-24 NaN
1901-01-25 NaN
1901-01-26 NaN
1901-01-27 NaN
1901-01-28 NaN
1901-01-29 NaN
1901-01-30 NaN
1901-01-31 NaN
1901-02-01 145.9

現在佔了位置之後就可以進行插值了，方法有很多，比如線性，多項式，spline等等

from pandas import read_csv
from pandas import datetime

def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
upsampled = series.resample('D')
interpolated = upsampled.interpolate(method='linear')
print(interpolated.head(32))

效果圖如下：

Shamoo Sales Interpolated Linear

from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot

def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
upsampled = series.resample('D')
interpolated = upsampled.interpolate(method='spline', order=2)
print(interpolated.head(32))
interpolated.plot()
pyplot.show()

效果圖如下：

Shamoo Sales Interpolated Spline

下采樣：我們有月度資料，現在想要季度資料

from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot

def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
resample = series.resample('Q')
quarterly_mean_sales = resample.mean()
print(quarterly_mean_sales.head())
quarterly_mean_sales.plot()
pyplot.show()

Q代表季度，mean（）代表幾個月份的均值去代替

Month
1901-03-31 198.333333
1901-06-30 156.033333
1901-09-30 216.366667
1901-12-31 215.100000
1902-03-31 184.633333
Freq: Q-DEC, Name: Sales, dtype: float64

當然你也可以用年份的，這裡用sum

from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot

def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
resample = series.resample('A')
quarterly_mean_sales = resample.sum()
print(quarterly_mean_sales.head())
quarterly_mean_sales.plot()
pyplot.show()

2.http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling

3.http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.interpolate.html

時間序列--上取樣、下采樣

時間序列--上取樣、下采樣

OpenCV-Python——上取樣、下采樣與拉普拉斯金字塔

opencv013-影象上取樣和下采樣（+高斯不同）

上取樣與下采樣

影象的上取樣和下采樣

影象金字塔——上取樣和下采樣

20180903影象的上取樣和下采樣

scipy.ndimage.zoom上取樣與下采樣

Imblearn package study（不平衡資料處理之過取樣、下采樣、綜合取樣）

影象處理——上取樣和下采樣

降取樣，過取樣，欠取樣，子取樣，下采樣，上取樣，你學會了嗎？【總結】

降取樣，過取樣，欠取樣，子取樣，下采樣，上取樣

金字塔向上、下采樣(圖片的大小轉換)

降取樣因子/下采樣因子 CNN down-samples

資料不平衡：下采樣、上取樣python程式碼實現

解決U-net上取樣過程後，結合下采樣資訊時特徵圖大小不匹配問題

影象的上取樣（upsampling）與下采樣（subsampled）

10.邏輯迴歸-下采樣、過取樣、交叉驗證

9.邏輯迴歸-下采樣、過取樣、交叉驗證

影象的上取樣（up-sampling）和下采樣(down-sampling)

時間序列--上取樣、下采樣

相關推薦