1. 程式人生 > >愉快的學習就從翻譯開始吧_Multivariate Time Series Forecasting with LSTMs in Keras_3_Multivariate LSTM Forecast

愉快的學習就從翻譯開始吧_Multivariate Time Series Forecasting with LSTMs in Keras_3_Multivariate LSTM Forecast

3. Multivariate LSTM Forecast Model/多變數LSTM預測模型

In this section, we will fit an LSTM to the problem.

本章,我們將一個LSTM擬合到這個問題

LSTM Data Preparation/LSTM 資料準備

The first step is to prepare the pollution dataset for the LSTM.

第一步是為LSTM準備汙染資料集。

This involves framing the dataset as a supervised learning problem and normalizing the input variables.

這包括構造資料集為監督學習問題,和歸一化輸入變數(歸一化?)

We will frame the supervised learning problem as predicting the pollution at the current hour (t) given the pollution measurement and weather conditions at the prior time step.

我們將把監督學習問題構建為給出前一個時間步的汙染測量和天氣條件,來預測當前時間的汙染(這裡的汙染完全可以用PM2.5資料代替,就不會懵逼了).

This formulation is straightforward and just for this demonstration. Some alternate formulations you could explore include:

這個公式很簡單,只為這個演示,你可以探索一些其他的公式:

  • Predict the pollution for the next hour based on the weather conditions and pollution over the last 24 hours.
    基於過去24小時的天氣狀況和汙染預測下一個小時的汙染。
  • Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.
    像上面一樣預測汙染並且給出下一小時的預期的天氣狀況

We can transform the dataset using the series_to_supervised() function developed in the blog post:

我們可以使用部落格中開發的series_to_supervised() function來轉換資料集

First, the “pollution.csv” dataset is loaded. The wind speed feature is label encoded (integer encoded). This could further be one-hot encoded in the future if you are interested in exploring it.

首先,‘pollution.csv’資料集被載入,風速特徵是標籤編碼(整數編碼)。 如果您有興趣探索它,這可能會在未來進一步被熱編碼(你這樣寫,我怎麼可能會懂呀罵人,看到後面再回頭看大概懂了,大笑label encoded和one-hot encoded 是兩種編碼處理,就不該翻譯成中文)。

Next, all features are normalized, then the dataset is transformed into a supervised learning problem. The weather variables for the hour to be predicted (t) are then removed.

接下來,所有特徵被歸一化,然後資料集被轉化為監督學習問題,被預測的小時天氣變數被去除。

The complete code listing is provided below.

完整程式碼清單,提供如下:

from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from pandas import set_option
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler


# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence(t-n,... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d' % (j + 1, i)) for j in range(n_vars)]
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j + 1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j + 1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg


# load dataset
dataset = read_csv('pollution.csv', header=0, index_col=0)
values = dataset.values
# integer encode direction
encoder = LabelEncoder()
values[:, 4] = encoder.fit_transform(values[:, 4])
# ensure all data is float
values = values.astype('float32')
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# frame as supervised learning
reframed = series_to_supervised(scaled, 1, 1)
# drop columns we don't want to predict
reframed.drop(reframed.columns[[9, 10, 11, 12, 13, 14, 15]], axis=1, inplace=True)
set_option('display.max_columns', None)
print(reframed.head())

Running the example prints the first 5 rows of the transformed dataset. We can see the 8 input variables (input series) and the 1 output variable (pollution level at the current hour).

執行例子,打印出轉換後資料集的前五行,我們看到有八個輸入變數(輸入序列),和一個輸出變數(當前小時的汙染水平)

   var1(t-1  var2(t-1  var3(t-1  var4(t-1  var5(t-1  var6(t-1  var7(t-1  \
1  0.129779  0.352941  0.245902  0.527273  0.666667  0.002290  0.000000   
2  0.148893  0.367647  0.245902  0.527273  0.666667  0.003811  0.000000   
3  0.159960  0.426471  0.229508  0.545454  0.666667  0.005332  0.000000   
4  0.182093  0.485294  0.229508  0.563637  0.666667  0.008391  0.037037   
5  0.138833  0.485294  0.229508  0.563637  0.666667  0.009912  0.074074   

   var8(t-1   var1(t)  
1       0.0  0.148893  
2       0.0  0.159960  
3       0.0  0.182093  
4       0.0  0.138833  
5       0.0  0.109658  

This data preparation is simple and there is more we could explore. Some ideas you could look at include:

這個資料準備工作很簡單,我們可以探索更多。 您可以檢視的一些想法包括:

  • One-hot encoding wind speed.
    One-hot encoding 風速
  • Making all series stationary with differencing and seasonal adjustment.
    通過差分和季節調整使資料穩定
  • Providing more than 1 hour of input time steps.
    提供超過1小時的輸入時間步

This last point is perhaps the most important given the use of Backpropagation through time by LSTMs when learning sequence prediction problems.

在學習序列預測問題時,最後一點可能是最重要的,因為LSTM使用反向傳播時間。

用到的知識點:

pandas.DataFrame.astype

DataFrame.astype(dtypecopy=Trueerrors='raise'**kwargs)[source]

Cast a pandas object to a specified dtype dtype.

Parameters:

dtype : data type, or dict of column name -> data type

Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

copy : bool, default True.

Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).

errors : {‘raise’, ‘ignore’}, default ‘raise’.

Control raising of exceptions on invalid data for provided dtype.

  • raise : allow exceptions to be raised
  • ignore : suppress exceptions. On error return original object

New in version 0.20.0.

raise_on_error : raise on invalid input

Deprecated since version 0.20.0: Use errors instead

kwargs : keyword arguments to pass on to the constructor
Returns:
casted : type of caller

See also

Convert argument to datetime.
Convert argument to timedelta.
Convert argument to a numeric type.
Cast a numpy array to a specified type.

Examples

>>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype('int64')
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype('category')
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> ser.astype('category', ordered=True, categories=[2, 1])
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Note that using copy=False and changing data on a new pandas object may propagate changes:

>>> s1 = pd.Series([1,2])
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1  # note that s1[0] has changed too
0    10
1     2
dtype: int64

pandas.concat

pandas.concat(objsaxis=0join='outer'join_axes=Noneignore_index=Falsekeys=Nonelevels=Nonenames=Noneverify_integrity=Falsesort=Nonecopy=True)[source]

Concatenate pandas objects along a particular axis with optional set logic along the other axes.

Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.

Parameters:

objs : a sequence or mapping of Series, DataFrame, or Panel objects

If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised

axis : {0/’index’, 1/’columns’}, default 0

The axis to concatenate along

join : {‘inner’, ‘outer’}, default ‘outer’

How to handle indexes on other axis(es)

join_axes : list of Index objects

Specific indexes to use for the other n - 1 axes instead of performing inner/outer set logic

ignore_index : boolean, default False

If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join.

keys : sequence, default None

If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level

levels : list of sequences, default None

Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys

names : list, default None

Names for the levels in the resulting hierarchical index

verify_integrity : boolean, default False

Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation

sort : boolean, default None

Sort non-concatenation axis if it is not already aligned when join is ‘outer’. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas.

Explicitly pass sort=True to silence the warning and sort. Explicitly pass sort=False to silence the warning and not sort.

This has no effect when join='inner', which already preserves the order of the non-concatenation axis.

New in version 0.23.0.

copy : boolean, default True

If False, do not copy data unnecessarily

Returns:

concatenated : object, type of objs

When concatenating all Series along the index (axis=0), a Series is returned. When objs contains at least one DataFrame, a DataFrame is returned. When concatenating along the columns (axis=1), a DataFrame is returned.

Notes

The keys, levels, and names arguments are all optional.

A walkthrough of how this method fits in with other tools for combining pandas objects can be found here.

Examples

Combine two Series.

>>> s1 = pd.Series(['a', 'b'])
>>> s2 = pd.Series(['c', 'd'])
>>> pd.concat([s1, s2])
0    a
1    b
0    c
1    d
dtype: object

Clear the existing index and reset it in the result by setting the ignore_index option to True.

>>> pd.concat([s1, s2], ignore_index=True)
0    a
1    b
2    c
3    d
dtype: object

Add a hierarchical index at the outermost level of the data with the keys option.

>>> pd.concat([s1, s2], keys=['s1', 's2',])
s1  0    a
    1    b
s2  0    c
    1    d
dtype: object

Label the index keys you create with the names option.

>>> pd.concat([s1, s2], keys=['s1', 's2'],
...           names=['Series name', 'Row ID'])
Series name  Row ID
s1           0         a
             1         b
s2           0         c
             1         d
dtype: object

Combine two DataFrame objects with identical columns.

>>> df1 = pd.DataFrame([['a', 1], ['b', 2]],
...                    columns=['letter', 'number'])
>>> df1
  letter  number
0      a       1
1      b       2
>>> df2 = pd.DataFrame([['c', 3], ['d', 4]],
...                    columns=['letter', 'number'])
>>> df2
  letter  number
0      c       3
1      d       4
>>> pd.concat([df1, df2])
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

Combine DataFrame objects with overlapping columns and return everything. Columns outside the intersection will be filled with NaN values.

>>> df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']],
...                    columns=['letter', 'number', 'animal'])
>>> df3
  letter  number animal
0      c       3    cat
1      d       4    dog
>>> pd.concat([df1, df3])
  animal letter  number
0    NaN      a       1
1    NaN      b       2
0    cat      c       3
1    dog      d       4

Combine DataFrame objects with overlapping columns and return only those that are shared by passing inner to the join keyword argument.

>>> pd.concat([df1, df3], join="inner")
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

Combine DataFrame objects horizontally along the x axis by passing in axis=1.

>>> df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george']],
...                    columns=['animal', 'name'])
>>> pd.concat([df1, df4], axis=1)
  letter  number  animal    name
0      a       1    bird   polly
1      b       2  monkey  george

Prevent the result from including duplicate index values with the verify_integrity option.

>>> df5 = pd.DataFrame([1], index=['a'])
>>> df5
   0
a  1
>>> df6 = pd.DataFrame([2], index=['a'])
>>> df6
   0
a  2
>>> pd.concat([df5, df6], verify_integrity=True)
Traceback (most recent call last):
    ...
ValueError: Indexes have overlapping values: ['a']

pandas.DataFrame.dropna

DataFrame.dropna(axis=0how='any'thresh=Nonesubset=Noneinplace=False)[source]

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Determine if rows or columns which contain missing values are removed.

  • 0, or ‘index’ : Drop rows which contain missing values.
  • 1, or ‘columns’ : Drop columns which contain missing value.

Deprecated since version 0.23.0:: Pass tuple or list to drop on multiple

axes.

how : {‘any’, ‘all’}, default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

  • ‘any’ : If any NA values are present, drop that row or column.
  • ‘all’ : If all values are NA, drop that row or column.

thresh : int, optional

Require that many non-NA values.

subset : array-like, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplace : bool, default False

If True, do operation inplace and return None.

Returns:

DataFrame

DataFrame with NA entries dropped from it.

See also

Indicate missing values.
Indicate existing (non-missing) values.
Replace missing values.
Drop missing values.
Drop missing indices.

Examples

>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Drop the rows where at least one element is missing.

>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis
            
           

相關推薦

愉快學習翻譯開始_Multivariate Time Series Forecasting with LSTMs in Keras_3_Multivariate LSTM Forecast

3. Multivariate LSTM Forecast Model/多變數LSTM預測模型In this section, we will fit an LSTM to the problem.本章,我們將一個LSTM擬合到這個問題LSTM Data Preparatio

Multivariate Time Series Forecasting with LSTMs in Keras 中文版翻譯

像長期短期記憶(LSTM)神經網路的神經網路能夠模擬多個輸入變數的問題。這在時間序列預測中是一個很大的益處,其中古典線性方法難以適應多變數或多輸入預測問題。 在本教程中,您將發現如何在Keras深度學習庫中開發多變數時間序列預測的LSTM模型。 完成本教程後,您將知道: 如何

大資料技術學習路線,有信心能堅持學習的朋友,現在開始

如果你看完有信心能堅持學習的話,那就當下開始行動吧! 一、大資料技術基礎 1、linux操作基礎 linux系統簡介與安裝 linux常用命令–檔案操作 linux常用命令–使用者管理與許可權 linux常用命令–系統管理 linux常用命令–免密登陸

Java 工程師快速入門深度學習 Deeplearning4j 開始

作者:萬宮璽 隨著機器學習、深度學習為主要代表的人工智慧技術的逐漸成熟,越來越多的 AI 產品得到了真正的落地。無論是以語音識別和自然語言處理為基礎的個人助理軟體,還是以人臉識別為基礎的刷臉付費系統,這些都是 AI 技術在現實生活中的實際應用。應當說 AI 正在走進千家萬戶

不知道寫啥最近在學js,弄一個js的開始,儘量每天都堅持更新

先介紹一下本人,本人是個小菜雞,做java的,不過最近公司的需求開始在學習js,之前有一些js的基礎。 不過沒關係,我這人就喜歡從頭開始看,畢竟基礎要打牢!好了廢話不多說開始吧。 第一做點準備,需要個編輯器,我比較喜歡用Hbuilder。(突然

事務的學習jdbc開始:jdbc對事務的支持與實現

如何實現 ransac 阻止 事務隔離 完成後 value 事務提交 val ack   在使用spring對項目進行開發時,所有的事務都是由spring來管理的。這樣一來我們就可以不需要操心事務,可以專心的處理業務代碼。   但是,事務的底層究竟是如何實現的呢?那就從j

創新、成就、應用OpenStack轉變現在開始

2018年11月13日,柏林陰雨的天氣並沒有影響這些熱情的開發者們參與OpenStack峰會(OpenStack Summit)的熱情,上千人的會場座無虛席,而接下來的幾天時間裡,開發者們圍繞OpenStack產業鏈上下游的成員開始全方位地開展技術交流。 技術的推動 時代的轉型 實際上,伴隨

學習筆記:0開始學習大資料-20. 機器學習spark ml演算法庫應用練習

作為大資料初學者,機器學習演算法的運用,只是hello world知道個123,以後專案需要再深入 Mahout,spark MLlib,spark ML三個演算法庫,根據網上了解比較,採用spark ml演算法庫作為學習物件。 本次學習只是除錯能執行網上的例子 程式碼案例網址: h

學習筆記:0開始學習大資料-19. storm開發及執行環境部署

一.eclipse strom開發環境 1. eclipse waven開發環境支援storm java程式開發很簡單,只要pom.xml 加入依賴即可 <dependency>     <groupId>org.apache.storm</

學習筆記:0開始學習大資料-18.kettle安裝使用

Kettle是一款國外開源的ETL工具,純java編寫,可以在Windows、Linux、Unix上執行,資料抽取高效穩定。 Kettle 中文名稱叫水壺,該專案的主程式設計師MATT 希望把各種資料放到一個壺裡,然後以一種指定的格式流出。 Kettle這個ETL工具集,它允許你管理來自不同資料庫的

學習筆記:0開始學習大資料-17.Redis安裝及使用

Redis 是一個高效能的key-value資料庫。 redis的出現,很大程度補償了memcached這類key/value儲存的不足,在部 分場合可以對關係資料庫起到很好的補充作用。 1. 下載 wget http://download.redis.io/releases/redis-5

學習筆記:0開始學習大資料-16. kafka安裝及使用

kafka是訊息處理服務的開源軟體,高效高可用。可以作為大資料收集的工具或資料的管道。 1. 下載  http://kafka.apache.org/downloads 根據scala版本,我下載的是Scala 2.12  - kafka_2.12-2.1.0.tgz (as

學習筆記:0開始學習大資料-15. Flume安裝及使用

上節測試了spark  程式設計,spark sql ,spark streaming 等都測試可用了,接下來是資料來源的收集,Flume的安裝使用,其實很簡單,但作為完整,也寫個記錄筆記 1.下載  wget http://archive.cloudera.com/cd

學習筆記:0開始學習大資料-14. java spark程式設計實踐

上節搭建好了eclipse spark程式設計環境 在測試執行scala 或java 編寫spark程式 ,在eclipse平臺都可以執行,但打包匯出jar,提交 spark-submit執行,都不能執行,最後確定是版本問題,就是你在eclipse除錯的spark版本需和spark-submit

學習筆記:0開始學習大資料-13. Eclipse+Scala+Maven Spark開發環境配置

上節配置好了spark執行環境,可以通過 spark-shell  在scala語言介面互動執行spark命令 可以參照( https://blog.csdn.net/u010285974/article/details/81840413   Spark-shell執行計算)

學習筆記:0開始學習大資料-12. spark安裝部署

為了教學方便,考慮ALL IN ONE,一臺虛擬機器構建整個實訓環境,因此是偽分散式搭建spark  環境:   hadoop2.6.0-cdh5.15.1   jdk1.8   centos7 64位 1. 安裝scala環境 版本是scala-2.12.7,官網下載

學習筆記:0開始學習大資料-11. sqoop安裝部署

環境:centos7 已安裝java和hadoop 1.下載 wget http://archive.cloudera.com/cdh5/cdh/5/sqoop2-1.99.5-cdh5.16.0.tar.gz 2.解壓 tar -zxvf sqoop2-1.99.5-cdh5.16.0.t

學習筆記:0開始學習大資料-10. hive安裝部署

1. 下載 wget http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.15.1.tar.gz 2.解壓 tar -zxvf hive-1.1.0-cdh5.15.1.tar.gz 3.  hive的元資料(如表名,列

學習筆記:0開始學習大資料-9. MapReduce讀並寫Hbase資料

上節的MapReduce計算WordCount例子是從hdfs讀輸入檔案,計算結果也寫入hdfs MapReduce分散式計算的輸入輸出可以根據需要從hdfs或hbase讀取或寫入,如 A.讀hdfs-->寫hdfs B.讀hdfs-->寫hbase C.讀hbase--

學習筆記:0開始學習大資料-8.直接在Eclipse配置執行MapReduce程式

前面開發hadoop程式是打包成jar,然後在命令列執行 hadoop jar  XXX.jar  XXXX 的方式提交作業,現在記錄直接在Eclipse IDE執行MapReduce作業的方法,還是用經典的WordCount程式。 1.配置Eclipse 的hdfs環境