1. 程式人生 > >R語言使用機器學習算法預測股票市場

R語言使用機器學習算法預測股票市場

分析 article library 日期 ant else 3.4 set span

quantmod 介紹

quantmod 是一個非常強大的金融分析報, 包含數據抓取,清洗,建模等等功能.

1. 獲取數據 getSymbols

  默認是數據源是yahoo

獲取上交所股票為 getSymbols("600030.ss"), 深交所為 getSymbols("000002.sz"). ss表示上交所, sz表示深交所

2. 重命名函數 setSymbolLookup

3. 股息函數 getDividends

4. 除息調整函數 adjustOHLC

5. 除權除息函數 getSplits

6. 期權交易函數 getOptionChain

7. 財務報表 getFinancials / getFin

> library(quantmod)
> setSymbolLookup(WANKE=list(name="000002.sz", src="yahoo"))
> getSymbols("WANKE")
[1] "WANKE"
Warning message:
000002.sz contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them. 
> head(WANKE) 000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close 2008-03-17 14.221 14.221 14.221 13.65 2008-03-18 NA NA NA NA 2008-03-19 NA NA NA NA 2008-03-20 NA NA NA NA
2008-03-21 NA NA NA NA 2008-03-24 NA NA NA NA 000002.SZ.Volume 000002.SZ.Adjusted 2008-03-17 123340858 13.10156 2008-03-18 NA NA 2008-03-19 NA NA 2008-03-20 NA NA 2008-03-21 NA NA 2008-03-24 NA NA >


機器學習 Classification

首先, 簡化問題, 只預測股票的漲跌情況. 問題就變成一個分類問題, 把歷史數據分為漲跌兩種情況. 進一不簡化, 漲跌情況只與歷史數據情況有關.

我們使用Naive Bayes classifier (樸素的貝葉斯分類) 作為學習方法. 樸素的貝葉斯的定義為: 給定類別A條件下,所有的屬性Ai相互獨立

R語言的實現如下

> library(lubridate)
#日期包
> library(e1071)
#樸素貝葉斯包
> library(quantmod)
> setSymbolLookup(WANKE=list(name="000002.sz", src="yahoo"))
> getSymbols("WANKE")
[1] "WANKE"


> head(WANKE)
           000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close
2008-03-17         14.221         14.221        14.221           13.65
2008-03-18             NA             NA            NA              NA
2008-03-19             NA             NA            NA              NA
2008-03-20             NA             NA            NA              NA
2008-03-21             NA             NA            NA              NA
2008-03-24             NA             NA            NA              NA
           000002.SZ.Volume 000002.SZ.Adjusted
2008-03-17        123340858           13.10156
2008-03-18               NA                 NA
2008-03-19               NA                 NA
2008-03-20               NA                 NA
2008-03-21               NA                 NA
2008-03-24               NA                 NA
> tail(WANKE)
           000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close
2017-07-31          23.52          23.58         23.10           23.37
2017-08-01          23.35          23.55         23.20           23.42
2017-08-02          23.45          24.12         23.43           23.58
2017-08-03          23.58          23.58         22.79           23.11
2017-08-04          23.00          23.06         22.71           22.84
2017-08-07          22.82          23.05         22.68           22.71
           000002.SZ.Volume 000002.SZ.Adjusted
2017-07-31         30942482              23.37
2017-08-01         20952262              23.42
2017-08-02         35391017              23.58
2017-08-03         45518939              23.11
2017-08-04         29612306              22.84
2017-08-07         23409149              22.71
> 

> startDate <- as.Date("2010-01-01")
> endDate <- as.Date("2017-01-01")
> DayofWeek <- wday(WANKE, label=TRUE)
> PriceChange <- Cl(WANKE) - Op(WANKE)
#收盤減去開盤
> Class <- ifelse(PriceChange > 0, "UP", "DOWN")
#大於0就是漲
> DataSet <- data.frame(DayofWeek, Class)

> MyModel <- naiveBayes(DataSet[,1], DataSet[,2])
> MyModel

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = DataSet[, 1], y = DataSet[, 2])

A-priori probabilities:
DataSet[, 2]
     DOWN        UP 
0.5148148 0.4851852 

Conditional probabilities:
            x
DataSet[, 2]       Sun       Mon      Tues       Wed     Thurs       Fri
        DOWN 0.0000000 0.2374101 0.1510791 0.2158273 0.1870504 0.2086331
        UP   0.0000000 0.1603053 0.2442748 0.1908397 0.2137405 0.1908397
            x
DataSet[, 2]       Sat
        DOWN 0.0000000
        UP   0.0000000

> 
整個dataset的漲跌概率
DataSet[, 2]
     DOWN        UP 
0.5148148 0.4851852
基於這個漲跌概率下, 每天的漲跌概率
Conditional probabilities:
            x
DataSet[, 2]       Sun       Mon      Tues       Wed     Thurs       Fri
        DOWN 0.0000000 0.2374101 0.1510791 0.2158273 0.1870504 0.2086331
        UP   0.0000000 0.1603053 0.2442748 0.1908397 0.2137405 0.1908397
            x
DataSet[, 2]       Sat
        DOWN 0.0000000
        UP   0.0000000

模型改進

指數移動平均值 EMA (exponential moving average)

> W <- na.omit(WANKE)
> DayofWeek <- wday(W, label=TRUE)
> PriceChange <- Cl(W) - Op(W)
> Class <- ifelse(PriceChange > 0, "UP", "DOWN")
> EMA5 <- EMA(Op(W), n = 5)
> EMA10 <- EMA(Op(W), n = 10)
> EMACross <- EMA5 -EMA10
> EMACross <- round(EMACross, 2)
> DataSet2 <- data.frame(DayofWeek, EMACross, Class)
> DataSet2<-DataSet2[-c(1:10),]
> head(DataSet2)
           DayofWeek   EMA X000002.SZ.Close
2016-07-14     Thurs  0.11             DOWN
2016-07-15       Fri  0.04             DOWN
2016-07-18       Mon  0.00             DOWN
2016-07-19      Tues -0.10             DOWN
2016-07-20       Wed -0.23             DOWN
2016-07-21     Thurs -0.28             DOWN
> tail(DataSet2)
           DayofWeek   EMA X000002.SZ.Close
2017-07-31       Mon -0.34             DOWN
2017-08-01      Tues -0.31               UP
2017-08-02       Wed -0.26               UP
2017-08-03     Thurs -0.19             DOWN
2017-08-04       Fri -0.24             DOWN
2017-08-07       Mon -0.27             DOWN

> length(DayofWeek)
[1] 270
> TrainingSet<-DataSet2[1:200,]
> TestSet<-DataSet2[201:270,] 
> EMACrossModel<-naiveBayes(TrainingSet[,1:2],TrainingSet[,3]) 
> EMACrossModel

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = TrainingSet[, 1:2], y = TrainingSet[, 
    3])

A-priori probabilities:
TrainingSet[, 3]
DOWN   UP 
 0.5  0.5 

Conditional probabilities:
                DayofWeek
TrainingSet[, 3]  Sun  Mon Tues  Wed Thurs  Fri  Sat
            DOWN 0.00 0.22 0.13 0.24  0.18 0.23 0.00
            UP   0.00 0.16 0.27 0.17  0.23 0.17 0.00

                EMA
TrainingSet[, 3]    [,1]      [,2]
            DOWN  0.0333 0.4119553
            UP   -0.0177 0.4191522

> table(predict(EMACrossModel,TestSet),TestSet[,3],dnn=list(predicted,actual)) 
         actual
predicted DOWN UP
     DOWN   16 21
     UP     13 10
> 


參考文獻

quantmod

http://www.quantmod.com/,

https://github.com/dengyishuo/Notes/tree/master/quantmod

Naive Bayes classifier

http://blog.csdn.net/sulliy/article/details/6629201

Introduction to Use Machine Learning by R

https://www.inovancetech.com/blogML2.html

R語言使用機器學習算法預測股票市場