1. 程式人生 > >時間序列--選擇基準模型(以後的模型和這個基準比較

時間序列--選擇基準模型(以後的模型和這個基準比較

1.以前基準是用上一個時刻的值當做這一個時刻的值,

現在我設定引數不知道過去哪一個時刻的值取當做今天的值

from pandas import Series
from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot
# load data
series = Series.from_csv('car-sales.csv', header=0)
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
persistence_values = range(1, 25)
scores = list()
for p in persistence_values:
	# walk-forward validation
	history = [x for x in train]
	predictions = list()
	for i in range(len(test)):
		# make prediction
		yhat = history[-p]
		predictions.append(yhat)
		# observation
		history.append(test[i])
	# report performance
	rmse = sqrt(mean_squared_error(test, predictions))
	scores.append(rmse)
	print('p=%d RMSE:%.3f' % (p, rmse))
# plot scores over persistence values
pyplot.plot(persistence_values, scores)
pyplot.show()

結果:p=1 RMSE:3947.200
p=2 RMSE:5485.353
p=3 RMSE:6346.176
p=4 RMSE:6474.553
p=5 RMSE:5756.543
p=6 RMSE:5756.076
p=7 RMSE:5958.665
p=8 RMSE:6543.266
p=9 RMSE:6450.839
p=10 RMSE:5595.971
p=11 RMSE:3806.482
p=12 RMSE:1997.732
p=13 RMSE:3968.987
p=14 RMSE:5210.866
p=15 RMSE:6299.040
p=16 RMSE:6144.881
p=17 RMSE:5349.691
p=18 RMSE:5534.784
p=19 RMSE:5655.016
p=20 RMSE:6746.872
p=21 RMSE:6784.611
p=22 RMSE:5642.737
p=23 RMSE:3692.062
p=24 RMSE:2119.103

可以看出p=13最好

那麼就用t-13

from pandas import Series
from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot
# load data
series = Series.from_csv('car-sales.csv', header=0)
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
	# make prediction
	yhat = history[-12]
	predictions.append(yhat)
	# observation
	history.append(test[i])
# plot predictions vs observations
pyplot.plot(test)
pyplot.plot(predictions)
pyplot.show()

效果圖如下

Line Plot of Predicted Values vs Test Dataset for the t-12 Persistence Model

2.利用滑動窗口裡面的均值去當做這一個時刻的值

from pandas import Series
from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot
from numpy import mean
# load data
series = Series.from_csv('car-sales.csv', header=0)
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
window_sizes = range(1, 25)
scores = list()
for w in window_sizes:
	# walk-forward validation
	history = [x for x in train]
	predictions = list()
	for i in range(len(test)):
		# make prediction
		yhat = mean(history[-w:])
		predictions.append(yhat)
		# observation
		history.append(test[i])
	# report performance
	rmse = sqrt(mean_squared_error(test, predictions))
	scores.append(rmse)
	print('w=%d RMSE:%.3f' % (w, rmse))
# plot scores over window sizes values
pyplot.plot(window_sizes, scores)
pyplot.show()

windowsize就體現了用過去多少的均值(當然你也可以中位數)

w=1 RMSE:3947.200
w=2 RMSE:4350.413
w=3 RMSE:4701.446
w=4 RMSE:4810.510
w=5 RMSE:4649.667
w=6 RMSE:4549.172
w=7 RMSE:4515.684
w=8 RMSE:4614.551
w=9 RMSE:4653.493
w=10 RMSE:4563.802
w=11 RMSE:4321.599
w=12 RMSE:4023.968
w=13 RMSE:3901.634
w=14 RMSE:3907.671
w=15 RMSE:4017.276
w=16 RMSE:4084.080
w=17 RMSE:4076.399
w=18 RMSE:4085.376
w=19 RMSE:4101.505
w=20 RMSE:4195.617
w=21 RMSE:4269.784
w=22 RMSE:4258.226
w=23 RMSE:4158.029
w=24 RMSE:4021.885

可以看到w=13最好,那我們就選13

from pandas import Series
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from numpy import mean
# load data
series = Series.from_csv('car-sales.csv', header=0)
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
	# make prediction
	yhat = mean(history[-13:])
	predictions.append(yhat)
	# observation
	history.append(test[i])
# plot predictions vs observations
pyplot.plot(test)
pyplot.plot(predictions)
pyplot.show()

Line Plot of Predicted Values vs Test Dataset for the Mean w=13 Rolling Window Model

效果圖如上

https://machinelearningmastery.com/simple-time-series-forecasting-models/