愉快的學習就從翻譯開始吧_Multivariate Time Series Forecasting with LSTMs in Keras_3_Multivariate LSTM Forecast
3. Multivariate LSTM Forecast Model/多變數LSTM預測模型
In this section, we will fit an LSTM to the problem.
本章,我們將一個LSTM擬合到這個問題
LSTM Data Preparation/LSTM 資料準備
The first step is to prepare the pollution dataset for the LSTM.
第一步是為LSTM準備汙染資料集。
This involves framing the dataset as a supervised learning problem and normalizing the input variables.
這包括構造資料集為監督學習問題,和歸一化輸入變數(歸一化?)
We will frame the supervised learning problem as predicting the pollution at the current hour (t) given the pollution measurement and weather conditions at the prior time step.
我們將把監督學習問題構建為給出前一個時間步的汙染測量和天氣條件,來預測當前時間的汙染(這裡的汙染完全可以用PM2.5資料代替,就不會懵逼了).
This formulation is straightforward and just for this demonstration. Some alternate formulations you could explore include:
這個公式很簡單,只為這個演示,你可以探索一些其他的公式:
- Predict the pollution for the next hour based on the weather conditions and pollution over the last 24 hours.
基於過去24小時的天氣狀況和汙染預測下一個小時的汙染。 - Predict the pollution for the next hour as above and given the “expected” weather conditions for the next hour.
像上面一樣預測汙染並且給出下一小時的預期的天氣狀況
We can transform the dataset using the series_to_supervised() function developed in the blog post:
我們可以使用部落格中開發的series_to_supervised() function來轉換資料集
First, the “pollution.csv” dataset is loaded. The wind speed feature is label encoded (integer encoded). This could further be one-hot encoded in the future if you are interested in exploring it.
首先,‘pollution.csv’資料集被載入,風速特徵是標籤編碼(整數編碼)。 如果您有興趣探索它,這可能會在未來進一步被熱編碼(你這樣寫,我怎麼可能會懂呀,看到後面再回頭看大概懂了,label encoded和one-hot encoded 是兩種編碼處理,就不該翻譯成中文)。
Next, all features are normalized, then the dataset is transformed into a supervised learning problem. The weather variables for the hour to be predicted (t) are then removed.
接下來,所有特徵被歸一化,然後資料集被轉化為監督學習問題,被預測的小時天氣變數被去除。
The complete code listing is provided below.
完整程式碼清單,提供如下:
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from pandas import set_option
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence(t-n,... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d' % (j + 1, i)) for j in range(n_vars)]
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j + 1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j + 1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg
# load dataset
dataset = read_csv('pollution.csv', header=0, index_col=0)
values = dataset.values
# integer encode direction
encoder = LabelEncoder()
values[:, 4] = encoder.fit_transform(values[:, 4])
# ensure all data is float
values = values.astype('float32')
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# frame as supervised learning
reframed = series_to_supervised(scaled, 1, 1)
# drop columns we don't want to predict
reframed.drop(reframed.columns[[9, 10, 11, 12, 13, 14, 15]], axis=1, inplace=True)
set_option('display.max_columns', None)
print(reframed.head())
Running the example prints the first 5 rows of the transformed dataset. We can see the 8 input variables (input series) and the 1 output variable (pollution level at the current hour).
執行例子,打印出轉換後資料集的前五行,我們看到有八個輸入變數(輸入序列),和一個輸出變數(當前小時的汙染水平)
var1(t-1 var2(t-1 var3(t-1 var4(t-1 var5(t-1 var6(t-1 var7(t-1 \
1 0.129779 0.352941 0.245902 0.527273 0.666667 0.002290 0.000000
2 0.148893 0.367647 0.245902 0.527273 0.666667 0.003811 0.000000
3 0.159960 0.426471 0.229508 0.545454 0.666667 0.005332 0.000000
4 0.182093 0.485294 0.229508 0.563637 0.666667 0.008391 0.037037
5 0.138833 0.485294 0.229508 0.563637 0.666667 0.009912 0.074074
var8(t-1 var1(t)
1 0.0 0.148893
2 0.0 0.159960
3 0.0 0.182093
4 0.0 0.138833
5 0.0 0.109658
This data preparation is simple and there is more we could explore. Some ideas you could look at include:
這個資料準備工作很簡單,我們可以探索更多。 您可以檢視的一些想法包括:
- One-hot encoding wind speed.
One-hot encoding 風速 - Making all series stationary with differencing and seasonal adjustment.
通過差分和季節調整使資料穩定 - Providing more than 1 hour of input time steps.
提供超過1小時的輸入時間步
This last point is perhaps the most important given the use of Backpropagation through time by LSTMs when learning sequence prediction problems.
在學習序列預測問題時,最後一點可能是最重要的,因為LSTM使用反向傳播時間。
用到的知識點:
pandas.DataFrame.astype
DataFrame.
astype
(dtype, copy=True, errors='raise', **kwargs)[source]Cast a pandas object to a specified dtype
dtype
.Parameters: dtype : data type, or dict of column name -> data type
Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.
copy : bool, default True.
Return a copy when
copy=True
(be very careful settingcopy=False
as changes to values then may propagate to other pandas objects).errors : {‘raise’, ‘ignore’}, default ‘raise’.
Control raising of exceptions on invalid data for provided dtype.
raise
: allow exceptions to be raisedignore
: suppress exceptions. On error return original object
New in version 0.20.0.
raise_on_error : raise on invalid input
Deprecated since version 0.20.0: Use
errors
instead- kwargs : keyword arguments to pass on to the constructor
Returns: - casted : type of caller
See also
- Convert argument to datetime.
- Convert argument to timedelta.
- Convert argument to a numeric type.
- Cast a numpy array to a specified type.
Examples
>>> ser = pd.Series([1, 2], dtype='int32') >>> ser 0 1 1 2 dtype: int32 >>> ser.astype('int64') 0 1 1 2 dtype: int64
Convert to categorical type:
>>> ser.astype('category') 0 1 1 2 dtype: category Categories (2, int64): [1, 2]
Convert to ordered categorical type with custom ordering:
>>> ser.astype('category', ordered=True, categories=[2, 1]) 0 1 1 2 dtype: category Categories (2, int64): [2 < 1]
Note that using
copy=False
and changing data on a new pandas object may propagate changes:>>> s1 = pd.Series([1,2]) >>> s2 = s1.astype('int64', copy=False) >>> s2[0] = 10 >>> s1 # note that s1[0] has changed too 0 10 1 2 dtype: int64
pandas.concat
pandas.
concat
(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)[source]Concatenate pandas objects along a particular axis with optional set logic along the other axes.
Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.
Parameters: objs : a sequence or mapping of Series, DataFrame, or Panel objects
If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised
axis : {0/’index’, 1/’columns’}, default 0
The axis to concatenate along
join : {‘inner’, ‘outer’}, default ‘outer’
How to handle indexes on other axis(es)
join_axes : list of Index objects
Specific indexes to use for the other n - 1 axes instead of performing inner/outer set logic
ignore_index : boolean, default False
If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join.
keys : sequence, default None
If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level
levels : list of sequences, default None
Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys
names : list, default None
Names for the levels in the resulting hierarchical index
verify_integrity : boolean, default False
Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation
sort : boolean, default None
Sort non-concatenation axis if it is not already aligned when join is ‘outer’. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas.
Explicitly pass
sort=True
to silence the warning and sort. Explicitly passsort=False
to silence the warning and not sort.This has no effect when
join='inner'
, which already preserves the order of the non-concatenation axis.New in version 0.23.0.
copy : boolean, default True
If False, do not copy data unnecessarily
Returns: concatenated : object, type of objs
When concatenating all
Series
along the index (axis=0), aSeries
is returned. Whenobjs
contains at least oneDataFrame
, aDataFrame
is returned. When concatenating along the columns (axis=1), aDataFrame
is returned.Notes
The keys, levels, and names arguments are all optional.
A walkthrough of how this method fits in with other tools for combining pandas objects can be found here.
Examples
Combine two
Series
.>>> s1 = pd.Series(['a', 'b']) >>> s2 = pd.Series(['c', 'd']) >>> pd.concat([s1, s2]) 0 a 1 b 0 c 1 d dtype: object
Clear the existing index and reset it in the result by setting the
ignore_index
option toTrue
.>>> pd.concat([s1, s2], ignore_index=True) 0 a 1 b 2 c 3 d dtype: object
Add a hierarchical index at the outermost level of the data with the
keys
option.>>> pd.concat([s1, s2], keys=['s1', 's2',]) s1 0 a 1 b s2 0 c 1 d dtype: object
Label the index keys you create with the
names
option.>>> pd.concat([s1, s2], keys=['s1', 's2'], ... names=['Series name', 'Row ID']) Series name Row ID s1 0 a 1 b s2 0 c 1 d dtype: object
Combine two
DataFrame
objects with identical columns.>>> df1 = pd.DataFrame([['a', 1], ['b', 2]], ... columns=['letter', 'number']) >>> df1 letter number 0 a 1 1 b 2 >>> df2 = pd.DataFrame([['c', 3], ['d', 4]], ... columns=['letter', 'number']) >>> df2 letter number 0 c 3 1 d 4 >>> pd.concat([df1, df2]) letter number 0 a 1 1 b 2 0 c 3 1 d 4
Combine
DataFrame
objects with overlapping columns and return everything. Columns outside the intersection will be filled withNaN
values.>>> df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], ... columns=['letter', 'number', 'animal']) >>> df3 letter number animal 0 c 3 cat 1 d 4 dog >>> pd.concat([df1, df3]) animal letter number 0 NaN a 1 1 NaN b 2 0 cat c 3 1 dog d 4
Combine
DataFrame
objects with overlapping columns and return only those that are shared by passinginner
to thejoin
keyword argument.>>> pd.concat([df1, df3], join="inner") letter number 0 a 1 1 b 2 0 c 3 1 d 4
Combine
DataFrame
objects horizontally along the x axis by passing inaxis=1
.>>> df4 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george']], ... columns=['animal', 'name']) >>> pd.concat([df1, df4], axis=1) letter number animal name 0 a 1 bird polly 1 b 2 monkey george
Prevent the result from including duplicate index values with the
verify_integrity
option.>>> df5 = pd.DataFrame([1], index=['a']) >>> df5 0 a 1 >>> df6 = pd.DataFrame([2], index=['a']) >>> df6 0 a 2 >>> pd.concat([df5, df6], verify_integrity=True) Traceback (most recent call last): ... ValueError: Indexes have overlapping values: ['a']
pandas.DataFrame.dropna
DataFrame.
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)[source]Remove missing values.
See the User Guide for more on which values are considered missing, and how to work with missing data.
Parameters: axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Determine if rows or columns which contain missing values are removed.
- 0, or ‘index’ : Drop rows which contain missing values.
- 1, or ‘columns’ : Drop columns which contain missing value.
Deprecated since version 0.23.0:: Pass tuple or list to drop on multiple
axes.
how : {‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
- ‘any’ : If any NA values are present, drop that row or column.
- ‘all’ : If all values are NA, drop that row or column.
thresh : int, optional
Require that many non-NA values.
subset : array-like, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
inplace : bool, default False
If True, do operation inplace and return None.
Returns: DataFrame
DataFrame with NA entries dropped from it.
See also
- Indicate missing values.
- Indicate existing (non-missing) values.
- Replace missing values.
- Drop missing values.
- Drop missing indices.
Examples
>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'], ... "toy": [np.nan, 'Batmobile', 'Bullwhip'], ... "born": [pd.NaT, pd.Timestamp("1940-04-25"), ... pd.NaT]}) >>> df name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Drop the rows where at least one element is missing.
>>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25
Drop the columns where at least one element is missing.
>>> df.dropna(axis
相關推薦
愉快的學習就從翻譯開始吧_Multivariate Time Series Forecasting with LSTMs in Keras_3_Multivariate LSTM Forecast
3. Multivariate LSTM Forecast Model/多變數LSTM預測模型In this section, we will fit an LSTM to the problem.本章,我們將一個LSTM擬合到這個問題LSTM Data Preparatio
Multivariate Time Series Forecasting with LSTMs in Keras 中文版翻譯
像長期短期記憶(LSTM)神經網路的神經網路能夠模擬多個輸入變數的問題。這在時間序列預測中是一個很大的益處,其中古典線性方法難以適應多變數或多輸入預測問題。 在本教程中,您將發現如何在Keras深度學習庫中開發多變數時間序列預測的LSTM模型。 完成本教程後,您將知道: 如何
大資料技術學習路線,有信心能堅持學習的朋友,從現在開始吧
如果你看完有信心能堅持學習的話,那就當下開始行動吧! 一、大資料技術基礎 1、linux操作基礎 linux系統簡介與安裝 linux常用命令–檔案操作 linux常用命令–使用者管理與許可權 linux常用命令–系統管理 linux常用命令–免密登陸
Java 工程師快速入門深度學習,就從 Deeplearning4j 開始
作者:萬宮璽 隨著機器學習、深度學習為主要代表的人工智慧技術的逐漸成熟,越來越多的 AI 產品得到了真正的落地。無論是以語音識別和自然語言處理為基礎的個人助理軟體,還是以人臉識別為基礎的刷臉付費系統,這些都是 AI 技術在現實生活中的實際應用。應當說 AI 正在走進千家萬戶
不知道寫啥最近在學js,就弄一個js的從零開始吧,儘量每天都堅持更新吧!
先介紹一下本人,本人是個小菜雞,做java的,不過最近公司的需求開始在學習js,之前有一些js的基礎。 不過沒關係,我這人就喜歡從頭開始看,畢竟基礎要打牢!好了廢話不多說開始吧。 第一做點準備,需要個編輯器,我比較喜歡用Hbuilder。(突然
事務的學習,從jdbc開始:jdbc對事務的支持與實現
如何實現 ransac 阻止 事務隔離 完成後 value 事務提交 val ack 在使用spring對項目進行開發時,所有的事務都是由spring來管理的。這樣一來我們就可以不需要操心事務,可以專心的處理業務代碼。 但是,事務的底層究竟是如何實現的呢?那就從j
創新、成就、應用OpenStack轉變就從現在開始
2018年11月13日,柏林陰雨的天氣並沒有影響這些熱情的開發者們參與OpenStack峰會(OpenStack Summit)的熱情,上千人的會場座無虛席,而接下來的幾天時間裡,開發者們圍繞OpenStack產業鏈上下游的成員開始全方位地開展技術交流。 技術的推動 時代的轉型 實際上,伴隨
學習筆記:從0開始學習大資料-20. 機器學習spark ml演算法庫應用練習
作為大資料初學者,機器學習演算法的運用,只是hello world知道個123,以後專案需要再深入 Mahout,spark MLlib,spark ML三個演算法庫,根據網上了解比較,採用spark ml演算法庫作為學習物件。 本次學習只是除錯能執行網上的例子 程式碼案例網址: h
學習筆記:從0開始學習大資料-19. storm開發及執行環境部署
一.eclipse strom開發環境 1. eclipse waven開發環境支援storm java程式開發很簡單,只要pom.xml 加入依賴即可 <dependency> <groupId>org.apache.storm</
學習筆記:從0開始學習大資料-18.kettle安裝使用
Kettle是一款國外開源的ETL工具,純java編寫,可以在Windows、Linux、Unix上執行,資料抽取高效穩定。 Kettle 中文名稱叫水壺,該專案的主程式設計師MATT 希望把各種資料放到一個壺裡,然後以一種指定的格式流出。 Kettle這個ETL工具集,它允許你管理來自不同資料庫的
學習筆記:從0開始學習大資料-17.Redis安裝及使用
Redis 是一個高效能的key-value資料庫。 redis的出現,很大程度補償了memcached這類key/value儲存的不足,在部 分場合可以對關係資料庫起到很好的補充作用。 1. 下載 wget http://download.redis.io/releases/redis-5
學習筆記:從0開始學習大資料-16. kafka安裝及使用
kafka是訊息處理服務的開源軟體,高效高可用。可以作為大資料收集的工具或資料的管道。 1. 下載 http://kafka.apache.org/downloads 根據scala版本,我下載的是Scala 2.12 - kafka_2.12-2.1.0.tgz (as
學習筆記:從0開始學習大資料-15. Flume安裝及使用
上節測試了spark 程式設計,spark sql ,spark streaming 等都測試可用了,接下來是資料來源的收集,Flume的安裝使用,其實很簡單,但作為完整,也寫個記錄筆記 1.下載 wget http://archive.cloudera.com/cd
學習筆記:從0開始學習大資料-14. java spark程式設計實踐
上節搭建好了eclipse spark程式設計環境 在測試執行scala 或java 編寫spark程式 ,在eclipse平臺都可以執行,但打包匯出jar,提交 spark-submit執行,都不能執行,最後確定是版本問題,就是你在eclipse除錯的spark版本需和spark-submit
學習筆記:從0開始學習大資料-13. Eclipse+Scala+Maven Spark開發環境配置
上節配置好了spark執行環境,可以通過 spark-shell 在scala語言介面互動執行spark命令 可以參照( https://blog.csdn.net/u010285974/article/details/81840413 Spark-shell執行計算)
學習筆記:從0開始學習大資料-12. spark安裝部署
為了教學方便,考慮ALL IN ONE,一臺虛擬機器構建整個實訓環境,因此是偽分散式搭建spark 環境: hadoop2.6.0-cdh5.15.1 jdk1.8 centos7 64位 1. 安裝scala環境 版本是scala-2.12.7,官網下載
學習筆記:從0開始學習大資料-11. sqoop安裝部署
環境:centos7 已安裝java和hadoop 1.下載 wget http://archive.cloudera.com/cdh5/cdh/5/sqoop2-1.99.5-cdh5.16.0.tar.gz 2.解壓 tar -zxvf sqoop2-1.99.5-cdh5.16.0.t
學習筆記:從0開始學習大資料-10. hive安裝部署
1. 下載 wget http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.15.1.tar.gz 2.解壓 tar -zxvf hive-1.1.0-cdh5.15.1.tar.gz 3. hive的元資料(如表名,列
學習筆記:從0開始學習大資料-9. MapReduce讀並寫Hbase資料
上節的MapReduce計算WordCount例子是從hdfs讀輸入檔案,計算結果也寫入hdfs MapReduce分散式計算的輸入輸出可以根據需要從hdfs或hbase讀取或寫入,如 A.讀hdfs-->寫hdfs B.讀hdfs-->寫hbase C.讀hbase--
學習筆記:從0開始學習大資料-8.直接在Eclipse配置執行MapReduce程式
前面開發hadoop程式是打包成jar,然後在命令列執行 hadoop jar XXX.jar XXXX 的方式提交作業,現在記錄直接在Eclipse IDE執行MapReduce作業的方法,還是用經典的WordCount程式。 1.配置Eclipse 的hdfs環境