1. 程式人生 > >時間序列--模型前的轉換

時間序列--模型前的轉換

1.sqrt轉換

先看序列的trend,如果有二次曲線的表現形式的話,可以做個sqrt

Quadratic Time Series

昨晚sqrt之後張成這樣子

Log Transformed Exponential Time Series

原來的資料長這樣:

Airline Passengers Dataset Plot

做個sqrt

from pandas import Series
from pandas import DataFrame
from numpy import sqrt
from matplotlib import pyplot
series = Series.from_csv('airline-passengers.csv', header=0)
dataframe = DataFrame(series.values)
dataframe.columns = ['passengers']
dataframe['passengers'] = sqrt(dataframe['passengers'])
pyplot.figure(1)
# line plot
pyplot.subplot(211)
pyplot.plot(dataframe['passengers'])
# histogram
pyplot.subplot(212)
pyplot.hist(dataframe['passengers'])
pyplot.show()

變成這個樣子:

Square Root Transform of Airline Passengers Dataset Plot

還是有趨勢啊。。。。

2.log轉換

昨晚log之後也應該張這樣子

Log Transformed Exponential Time Series

利用上面的真實資料做log

from pandas import Series
from pandas import DataFrame
from numpy import log
from matplotlib import pyplot
series = Series.from_csv('airline-passengers.csv', header=0)
dataframe = DataFrame(series.values)
dataframe.columns = ['passengers']
dataframe['passengers'] = log(dataframe['passengers'])
pyplot.figure(1)
# line plot
pyplot.subplot(211)
pyplot.plot(dataframe['passengers'])
# histogram
pyplot.subplot(212)
pyplot.hist(dataframe['passengers'])
pyplot.show()

 BoxCox Log Transform of Airline Passengers Dataset Plot

表現的更加正態了,log轉換很受歡迎

3.box-cox轉換

 

 BoxCox Auto Transform of Airline Passengers Dataset Plot

結果圖如上

https://machinelearningmastery.com/power-transform-time-series-forecast-data-python/

 

  • lambda = -1. is a reciprocal transform.
  • lambda = -0.5 is a reciprocal square root transform.
  • lambda = 0.0 is a log transform.
  • lambda
    = 0.5 is a square root transform.
  • lambda = 1.0 is no transform.
  • from pandas import Series
    from pandas import DataFrame
    from scipy.stats import boxcox
    from matplotlib import pyplot
    series = Series.from_csv('airline-passengers.csv', header=0)
    dataframe = DataFrame(series.values)
    dataframe.columns = ['passengers']
    dataframe['passengers'] = boxcox(dataframe['passengers'], lmbda=0.0)
    pyplot.figure(1)
    # line plot
    pyplot.subplot(211)
    pyplot.plot(dataframe['passengers'])
    # histogram
    pyplot.subplot(212)
    pyplot.hist(dataframe['passengers'])
    pyplot.show()

    這裡舉了個log的例子

  • BoxCox Log Transform of Airline Passengers Dataset Plot

  • 神奇的是,他可以自己選一個lambda

  • We can set the lambda parameter to None (the default) and let the function find a statistically tuned value.

    The following example demonstrates this usage, returning both the transformed dataset and the chosen lambda value.

  • from pandas import Series
    from pandas import DataFrame
    from scipy.stats import boxcox
    from matplotlib import pyplot
    series = Series.from_csv('airline-passengers.csv', header=0)
    dataframe = DataFrame(series.values)
    dataframe.columns = ['passengers']
    dataframe['passengers'], lam = boxcox(dataframe['passengers'])
    print('Lambda: %f' % lam)
    pyplot.figure(1)
    # line plot
    pyplot.subplot(211)
    pyplot.plot(dataframe['passengers'])
    # histogram
    pyplot.subplot(212)
    pyplot.hist(dataframe['passengers'])
    pyplot.show()

    mbda: 0.148023

    1

    Lambda: 0.148023