1. 程式人生 > >多因子選股模型

多因子選股模型

轉載自:https://www.joinquant.com/post/15833?tag=algorithm

多因子選股模型

基於《【研究】量化選股-因子檢驗和多因子模型的構建》https://zhuanlan.zhihu.com/quantstory/20634542

在原始碼的基礎上添加了一些因子,同時將時間滯後。

1.時間選取11-17年作為樣本期,並進行因子篩選及檢驗。

2.基準選取上證綜指(000001.XSHG)

模型構建及因子選取

擬選取以下四個方面的因子:

  1. 價值類因子:市盈率(PE),市淨率(PB),市銷率(PS),基本每股收益(EPS),賬面市值比(B/M)
  2. 成長類因子:淨資產收益率(ROE),總資產淨利率(ROA),銷售毛利率(gross_profit_margin),淨利潤同比增長率(inc_net_profit_year_on_year),淨利潤環比增長率(inc_net_profit_annual),營業利潤同比增長率(inc_operation_profit_year_on_year),營業利潤環比增長率(inc_operation_profit_annual),主營毛利率(GP/R)、淨利率(P/R)
  3. 規模類因子:淨利潤(net_profit),營業收入(operating_revenue),總股本(capitalization),流通股本(circulating_cap),總市值(market_cap),流通市值(circulating_market_cap),資產負債(L/A)、固定資產比例(FAP)
  4. 交投類因子:換手率(turnover_ratio)

採用排序法對因子的有效性進行驗證。

In [1]:

import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import statsmodels.api as sm
import scipy.stats as scs
import matplotlib.pyplot as plt

月初取出所有因子數值,例如2018-01-01

In [2]:

factors = ['PE', 'PB', 'PS', 'EPS', 'B/M',
           'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 
                     'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R',
           'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap',
                     'L/A', 'FAP',
           'turnover_ratio']

# 月初取出因子值
def get_factors(fdate, factors):
    stock_set = get_index_stocks('000001.XSHG', fdate)
    q = query(
        valuation.code,
        balance.total_owner_equities/valuation.market_cap/100000000,
        valuation.pe_ratio,
        valuation.pb_ratio,
        valuation.ps_ratio,
        income.basic_eps,
        indicator.roe,
        indicator.roa,
        indicator.gross_profit_margin,
        indicator.inc_net_profit_year_on_year,
        indicator.inc_net_profit_annual,
        indicator.inc_operation_profit_year_on_year,
        indicator.inc_operation_profit_annual,
        income.total_profit/income.operating_revenue,
        income.net_profit/income.operating_revenue,
        income.net_profit,
        income.operating_revenue,
        valuation.capitalization,
        valuation.circulating_cap,
        valuation.market_cap,
        valuation.circulating_market_cap,
        balance.total_liability/balance.total_assets,
        balance.fixed_assets/balance.total_assets,
        valuation.turnover_ratio
        ).filter(
        valuation.code.in_(stock_set),
        valuation.circulating_market_cap
    )
    fdf = get_fundamentals(q, date=fdate)
    fdf.index = fdf['code']
    fdf.columns = ['code'] + factors
    return fdf.iloc[:,-23:]

fdf = get_factors('2018-01-01', factors)
fdf.head().T

Out[2]:

code 600000.XSHG 600004.XSHG 600006.XSHG 600007.XSHG 600008.XSHG
PE 1.143846e+00 4.827871e-01 6.119287e-01 3.655787e-01 6.300832e-01
PB 6.804400e+00 2.009680e+01 1.093361e+02 2.808410e+01 3.863470e+01
PS 9.538000e-01 2.144100e+00 1.787800e+00 2.736200e+00 3.135900e+00
EPS 2.244700e+00 4.643700e+00 6.311000e-01 6.594300e+00 2.607200e+00
B/M 4.800000e-01 1.900000e-01 -9.700000e-03 1.800000e-01 3.390000e-02
ROE 3.413200e+00 2.754000e+00 -2.960000e-01 2.928800e+00 1.513400e+00
ROA 2.316000e-01 1.990600e+00 -7.116000e-01 1.580100e+00 4.027000e-01
gross_profit_margin NaN 3.798770e+01 1.084940e+01 5.062150e+01 3.024400e+01
inc_net_profit_year_on_year -1.588900e+00 8.844300e+00 -6.544932e+02 6.390600e+00 2.954560e+01
inc_net_profit_annual -7.200000e-03 4.770500e+00 -2.012624e+03 3.357990e+01 -3.995000e-01
inc_operation_profit_year_on_year -1.833300e+00 1.868020e+01 -6.605708e+02 -4.675000e-01 1.377109e+02
inc_operation_profit_annual 2.019000e+00 2.919000e+00 -1.447837e+03 2.506970e+01 -2.233990e+01
GP/R 4.344424e-01 3.085870e-01 -4.038174e-02 3.296453e-01 1.174919e-01
P/R 3.350075e-01 2.308729e-01 -3.273438e-02 2.473615e-01 8.858635e-02
net_profit 1.387400e+10 3.935985e+08 -1.587791e+08 1.823589e+08 1.900669e+08
operating_revenue 4.141400e+10 1.704828e+09 4.850528e+09 7.372163e+08 2.145555e+09
capitalization 2.935208e+06 2.069320e+05 2.000000e+05 1.007282e+05 4.820614e+05
circulating_cap 2.810376e+06 2.069320e+05 2.000000e+05 1.007282e+05 4.820614e+05
market_cap 3.695427e+03 3.041901e+02 1.170000e+02 1.726482e+02 2.477796e+02
circulating_market_cap 3.538264e+03 3.041901e+02 1.170000e+02 1.726482e+02 2.477796e+02
L/A 9.302917e-01 2.719476e-01 6.862103e-01 4.571281e-01 6.699646e-01
FAP 4.168150e-03 3.381332e-01 1.754063e-01 1.815366e-01 1.011792e-01
turnover_ratio 5.820000e-02 4.095000e-01 5.574000e-01 7.120000e-02 3.734000e-01

對每個因子大小排序(以流通市值為例)

In [3]:

score = fdf['circulating_market_cap'].order()
score.head()

Out[3]:

code
603580.XSHG    5.0777
603991.XSHG    5.2659
603330.XSHG    5.3535
603041.XSHG    5.6300
603269.XSHG    5.7038
Name: circulating_market_cap, dtype: float64

股票個數

In [4]:

len(score)

Out[4]:

1352

按照流通市值將股票池進行五等分

In [5]:

startdate = '2018-01-01'
enddate = '2018-02-01'
nextdate = '2018-03-01'
df = {}
circulating_market_cap = fdf['circulating_market_cap']
port1 = list(score.index)[: len(score)/5]
port2 = list(score.index)[ len(score)/5: 2*len(score)/5]
port3 = list(score.index)[ 2*len(score)/5: -2*len(score)/5]
port4 = list(score.index)[ -2*len(score)/5: -len(score)/5]
port5 = list(score.index)[ -len(score)/5: ]

按流通市值加權計算組合月收益(例如2018-01,2018-02月收益)

In [6]:

def calculate_port_monthly_return(port, startdate, enddate, nextdate, circulating_market_cap):
    
    close1 = get_price(port, startdate, enddate, 'daily', ['close'])
    close2 = get_price(port, enddate, nextdate, 'daily', ['close'])
    weighted_m_return = ((close2['close'].ix[0,:]/close1['close'].ix[0,:]-1)*
                         circulating_market_cap).sum()/(circulating_market_cap.ix[port].sum())
    return weighted_m_return
calculate_port_monthly_return(port1, '2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])

Out[6]:

-0.09004705495088357

計算基準月收益

In [7]:

def calculate_benchmark_monthly_return(startdate, enddate, nextdate):
    
    close1 = get_price(['000001.XSHG'],startdate,enddate,'daily',['close'])['close']
    close2 = get_price(['000001.XSHG'],enddate, nextdate, 'daily',['close'])['close']
    benchmark_return = (close2.ix[0,:]/close1.ix[0,:]-1).sum()
    return benchmark_return
calculate_benchmark_monthly_return('2018-01-01','2018-02-01','2018-03-01')

Out[7]:

0.029462448444448563

觀察5個組合在2018年初一個月內的收益情況

從結果可以看出,在構建因子組合之前,前四組的收益跑輸大盤。

In [8]:

benchmark_return = calculate_benchmark_monthly_return('2018-01-01', '2018-02-01', '2018-03-01')
df['port1'] = calculate_port_monthly_return(port1,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
df['port2'] = calculate_port_monthly_return(port2,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
df['port3'] = calculate_port_monthly_return(port3,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
df['port4'] = calculate_port_monthly_return(port4,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
df['port5'] = calculate_port_monthly_return(port5,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
print Series(df)
print 'benchmark_return %s'%benchmark_return
port1   -0.090047
port2   -0.088405
port3   -0.075064
port4   -0.060624
port5    0.068629
dtype: float64
benchmark_return 0.0294624484444

構建因子組合,計算不同組合月收益率

時間:2011-2017年,計算1-5組以及benchmark組合的月收益率,形成84×6的面板資料。

In [9]:

factors = ['PE', 'PB', 'PS', 'EPS', 'B/M',
           'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 
                     'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R',
           'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap',
                     'L/A', 'FAP',
           'turnover_ratio']
#因為研究模組取fundamental資料預設date為研究日期的前一天。所以要自備時間序列。按月取
year = ['2011','2012','2013','2014','2015','2016','2017']
month = ['01','02','03','04','05','06','07','08','09','10','11','12']
result = {}

for i in range(7*12):
    startdate = year[i/12] + '-' + month[i%12] + '-01'
    try:
        enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'
    except IndexError:
        enddate = '2018-01-01'
    try:
        nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'
    except IndexError:
        if enddate == '2018-01-01':
            nextdate = '2018-02-01'
        else:
            nextdate = '2018-01-01'
    # print 'time %s'%startdate
    fdf = get_factors(startdate,factors)
    CMV = fdf['circulating_market_cap']
    #5個組合,23個因子
    df = DataFrame(np.zeros(6*23).reshape(6,23),index = ['port1','port2','port3','port4','port5','benchmark'],columns = factors)
    for fac in factors:
        score = fdf[fac].order()
        port1 = list(score.index)[: len(score)/5]
        port2 = list(score.index)[ len(score)/5+1: 2*len(score)/5]
        port3 = list(score.index)[ 2*len(score)/5+1: -2*len(score)/5]
        port4 = list(score.index)[ -2*len(score)/5+1: -len(score)/5]
        port5 = list(score.index)[ -len(score)/5+1: ]
        df.ix['port1',fac] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['port2',fac] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['port3',fac] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['port4',fac] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['port5',fac] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['benchmark',fac] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)
    # print 'factor %s'%fac
    result[i+1]=df
monthly_return = pd.Panel(result)

取某個因子的5個組合月收益情況(例如市盈率PE)

In [11]:

monthly_return[:,:,'PE']

Out[11]:

  1 2 3 4 5 6 7 8 9 10 ... 75 76 77 78 79 80 81 82 83 84
port1 -0.063961 0.057468 -0.003538 0.011939 -0.000767 0.028005 0.048595 -0.003958 -0.109566 0.062509 ... 0.021345 -0.006460 -0.001198 0.049791 0.009338 0.049369 0.058072 0.069637 -0.033160 0.056772
port2 -0.065009 0.076102 -0.027128 -0.018031 -0.066994 0.031146 0.028017 -0.046184 -0.120076 0.034576 ... 0.037914 -0.048666 -0.053362 0.072887 0.040365 0.036833 0.069451 0.003253 -0.022327 0.018349
port3 -0.056932 0.079801 -0.017569 -0.027592 -0.073196 0.034040 -0.025730 -0.054367 -0.129013 0.045660 ... 0.017931 -0.045419 -0.053020 0.054920 0.028066 0.018909 0.027487 0.002008 -0.047184 0.006735
port4 -0.021293 0.046165 -0.005278 -0.011301 -0.069544 0.019637 -0.019397 -0.080514 -0.107611 0.081045 ... 0.004030 -0.021088 -0.005480 0.057372 0.065631 0.025304 -0.005043 0.059699 0.016978 0.023916
port5 0.013760 0.024953 0.050458 0.006419 -0.054836 0.007156 -0.035373 -0.041296 -0.052494 0.068615 ... 0.011837 -0.021979 0.048663 0.021466 0.074965 0.014275 -0.003084 -0.003351 0.003836 0.009916
benchmark -0.018820 0.042859 0.016612 -0.011870 -0.064326 0.005755 -0.020142 -0.054642 -0.082649 0.053409 ... 0.007198 -0.038710 -0.013070 0.030068 0.030266 0.022620 0.002156 0.006382 -0.023056 0.009257

6 rows × 84 columns

總收益情況

In [12]:

(monthly_return[:,:,'PE'].T+1).cumprod().tail()

Out[12]:

  port1 port2 port3 port4 port5 benchmark
80 2.173926 1.652334 1.708928 1.980452 2.433185 1.180349
81 2.300171 1.767090 1.755901 1.970465 2.425681 1.182893
82 2.460347 1.772839 1.759427 2.088099 2.417553 1.190442
83 2.378763 1.733257 1.676409 2.123552 2.426825 1.162996
84 2.513809 1.765060 1.687700 2.174338 2.450891 1.173762

因子檢驗量化指標

模型建立後,計算n個組合的年化複合收益、超額收益、不同市場情況下高收益組合跑贏benchmark和低收益組合跑輸benchmark的概率。

檢驗有效性的量化標準:

(1)序列1-n的組合,年化複合收益應滿足一定排序關係,即組合因子大小與收益具有較大相關關係。假定序列i的組合年化收益為Xi,則Xi與i的相關性絕對值Abs(Corr(Xi,i))>MinCorr。此處MinCorr為給定的最小相關閾值。

(2)序列1和n表示的兩個極端組合超額收益分別為AR1、ARn。MinARtop、MinARbottom表示最小超額收益閾值。 if AR1 > ARn #因子越小,收益越大 則應滿足AR1 > MinARtop >0 and ARn < MinARbottom < 0 if AR1 < ARn #因子越小,收益越大 則應滿足ARn > MinARtop >0 and AR1 < MinARbottom < 0 以上條件保證因子最大和最小的兩個組合,一個明顯跑贏市場,一個明顯跑輸市場。

(3)在任何市場行情下,1和n兩個極端組合,都以較高概率跑贏或跑輸市場。 以上三個條件,可以選出過去一段時間有較好選股能力的因子。

因為開始選擇的因子較多,因此三條量化標準的選擇更加嚴格,採用如下標準進行選取:

(1)記錄因子相關性,>0.7或<-0.7合格。

(2)記錄贏家組合和輸家組合超額收益。

(3)記錄贏家組合跑贏概率>0.6和輸家組合跑輸概率>0.4合格。

In [13]:

total_return = {}
annual_return = {}
excess_return = {}
win_prob = {}
loss_prob = {}
effect_test = {}
MinCorr = 0.3
Minbottom = -0.05
Mintop = 0.05
for fac in factors:
    effect_test[fac] = {}
    monthly = monthly_return[:,:,fac]
    total_return[fac] = (monthly+1).T.cumprod().iloc[-1,:]-1
    annual_return[fac] = (total_return[fac]+1)**(1./6)-1
    excess_return[fac] = annual_return[fac]- annual_return[fac][-1]
    #判斷因子有效性
    #1.年化收益與組合序列的相關性 大於 閾值
    effect_test[fac][1] = annual_return[fac][0:5].corr(Series([1,2,3,4,5],index = annual_return[fac][0:5].index))
    #2.高收益組合跑贏概率
    #因子小,收益小,port1是輸家組合,port5是贏家組合
    if total_return[fac][0] < total_return[fac][-2]:
        loss_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]
        loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))
        win_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]
        win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))
        
        effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]
        
        #超額收益
        effect_test[fac][2] = [excess_return[fac][-2]*100,excess_return[fac][0]*100]
            
    #因子小,收益大,port1是贏家組合,port5是輸家組合
    else:
        loss_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]
        loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))
        win_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]
        win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))
        
        effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]
        
        #超額收益
        effect_test[fac][2] = [excess_return[fac][0]*100,excess_return[fac][-2]*100]

#由於選擇的因子較多,test標準選取適當嚴格一些
#effect_test[1]記錄因子相關性,>0.7或<-0.7合格
#effect_test[2]記錄【贏家組合超額收益,輸家組合超額收益】
#effect_test[3]記錄贏家組合跑贏概率和輸家組合跑輸概率。【>0.6,>0.4】合格 (因實際情況,跑輸概率暫時不考慮)
DataFrame(effect_test).T

Out[13]:

  1 2 3
B/M 0.6281959 [15.1984852636, 8.76175660448] [0.690476190476, 0.404761904762]
EPS 0.2488584 [14.2720133294, 12.9632231367] [0.678571428571, 0.357142857143]
FAP -0.5671644 [13.4503120268, 9.44267504971] [0.619047619048, 0.380952380952]
GP/R 0.8064658 [13.7519085368, 9.10242336036] [0.619047619048, 0.357142857143]
L/A -0.5898578 [16.5046555213, 12.1611504111] [0.702380952381, 0.416666666667]
P/R 0.9215462 [13.980265264, 9.09336493425] [0.642857142857, 0.380952380952]
PB -0.8818369 [13.9012096024, 6.71073706755] [0.619047619048, 0.428571428571]
PE 0.1328435 [13.9001078939, 13.4085302139] [0.607142857143, 0.369047619048]
PS -0.5030761 [14.1865783133, 9.18250270639] [0.607142857143, 0.392857142857]
ROA 0.5423133 [19.3405425743, 9.77751849214] [0.75, 0.380952380952]
ROE 0.6386198 [17.9776162079, 9.73910681099] [0.654761904762, 0.404761904762]
capitalization -0.7644211 [22.4171821446, 9.86517390072] [0.583333333333, 0.404761904762]
circulating_cap -0.7761155 [19.8132954476, 9.86514645415] [0.571428571429, 0.369047619048]
circulating_market_cap -0.8791725 [38.1580067747, 10.3384004828] [0.714285714286, 0.369047619048]
gross_profit_margin 0.7770139 [15.5893122733, 9.22929383936] [0.642857142857, 0.452380952381]
inc_net_profit_annual 0.6899743 [14.9827068239, 9.99043264863] [0.678571428571, 0.392857142857]
inc_net_profit_year_on_year 0.8082138 [13.825611634, 3.32909642528] [0.630952380952, 0.416666666667]
inc_operation_profit_annual 0.5963116 [13.1949471333, 9.79858245467] [0.654761904762, 0.404761904762]
inc_operation_profit_year_on_year 0.8663793 [14.0478401847, 3.17046201915] [0.654761904762, 0.404761904762]
market_cap -0.8262643 [44.3574164544, 10.5284689923] [0.738095238095, 0.369047619048]
net_profit 0.04857344 [12.1195026493, 8.12374126557] [0.642857142857, 0.380952380952]
operating_revenue -0.7751005 [23.9766654178, 11.219895262] [0.630952380952, 0.345238095238]
turnover_ratio -0.6218568 [10.175151521, 4.22831336907] [0.619047619048, 0.511904761905]

有效因子

同時滿足上述三個條件的有:

(1)價值類因子:市盈率(B/M)

(2)成長類因子:主營毛利率(P/R),銷售毛利率(gross_profit_margin),淨利潤同比增長率(inc_net_profit_year_on_year),營業利潤同比增長率( inc_operation_profit_year_on_year)

(3)規模類因子:營業收入(operating_revenue),總股本(capitalization),流通股本(circulating_cap),總市值(market_cap),流通市值(circulating_market_cap),資產負債(L/A)

有效因子總收益

In [14]:

effective_factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
                     'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']
DataFrame(total_return).ix[:,effective_factors].T

Out[14]:

  port1 port2 port3 port4 port5 benchmark
B/M 0.918228 1.480658 1.142045 1.148155 1.686498 0.173762
L/A 1.870086 1.526532 0.843702 1.124105 1.297099 0.173762
P/R 0.952724 1.060859 1.183619 1.649951 1.524196 0.173762
capitalization 2.837346 1.656063 1.372731 1.964715 1.035016 0.173762
circulating_cap 2.382449 1.692737 1.170379 1.747633 1.035013 0.173762
circulating_market_cap 6.812751 2.619596 1.248171 1.063917 1.086887 0.173762
gross_profit_margin 0.967012 1.086652 0.899555 1.183325 1.740373 0.173762
inc_net_profit_year_on_year 0.421356 0.994127 1.055084 2.446268 1.504189 0.173762
inc_operation_profit_year_on_year 0.408645 0.801897 1.381442 2.183790 1.532979 0.173762
market_cap 9.116529 1.863749 1.864567 0.896007 1.108029 0.173762
operating_revenue 3.133399 1.325240 1.267816 1.006326 1.186449 0.173762

有效因子年化收益

In [15]:

DataFrame(annual_return).ix[:,effective_factors].T

Out[15]:

  port1 port2 port3 port4 port5 benchmark
B/M 0.114680 0.163486 0.135372 0.135911 0.179047 0.027062
L/A 0.192109 0.167045 0.107342 0.133781 0.148674 0.027062
P/R 0.117996 0.128084 0.139015 0.176358 0.166865 0.027062
capitalization 0.251234 0.176810 0.154892 0.198571 0.125714 0.027062
circulating_cap 0.225195 0.179503 0.137861 0.183477 0.125714 0.027062
circulating_market_cap 0.408642 0.239110 0.144559 0.128363 0.130446 0.027062
gross_profit_margin 0.119355 0.130425 0.112864 0.138990 0.182955 0.027062
inc_net_profit_year_on_year 0.060353 0.121912 0.127556 0.229018 0.165318 0.027062
inc_operation_profit_year_on_year 0.058767 0.103117 0.155598 0.212897 0.167540 0.027062
market_cap 0.470636 0.191669 0.191726 0.112517 0.132347 0.027062
operating_revenue 0.266829 0.151007 0.146220 0.123053 0.139261 0.027062

各個因子6組收益的時間序列圖:

In [16]:

def draw_return_picture(df):
    plt.figure(figsize =(10,4))
    plt.plot((df.T+1).cumprod().ix[:,0], label = 'port1')
    plt.plot((df.T+1).cumprod().ix[:,1], label = 'port2')
    plt.plot((df.T+1).cumprod().ix[:,2], label = 'port3')
    plt.plot((df.T+1).cumprod().ix[:,3], label = 'port4')
    plt.plot((df.T+1).cumprod().ix[:,4], label = 'port5')
    plt.plot((df.T+1).cumprod().ix[:,5], label = 'benchmark')
    plt.xlabel('return of factor %s'%fac)
    plt.legend(loc=0)
for fac in effective_factors:
    draw_return_picture(monthly_return[:,:,fac])

冗餘因子的剔除

有些因子,因為內在的邏輯比較相近等原因,選出來的組合在個股構成和收益等方面相關性較高。所以要對這些因子做冗餘剔除,保留同類因子中收益最好、區分度最高的因子。 由於本人能力有限,未完成此步驟,具體方法:

(1)對不同因子的n個組合打分。收益越大分值越大。分值達到好將分值賦給每月該組合內的所有個股。

if AR1 > ARn #因子越小,收益越大

則組合i的分值為(n-i+1)

if AR1 < ARn #因子越小,收益越小

則組合i的分值為i

(2)按月計算個股不同因子得分的相關性矩陣。得到第t月個股的因子得分相關性矩陣Score_Corrt,u,v。u,v為因子序號。

(3)計算樣本期內相關性矩陣的平均值。即樣本期共m個月,加總矩陣後取1/m。

(4)設定得分相關性閾值MinScoreCorr。只保留與其他因子相關性較小的因子。

模型建立和選股

根據選好的有效因子,每月初對市場個股計算因子得分,按一定權重求得所有因子的平均分。如遇因子當月無取值時,按剩下的因子分值求加權平均。通過對個股的加權平均得分進行排序,選擇排名靠前的股票交易。

以下程式碼段等權重對因子分值求和,選出分值最高的股票進行交易

In [17]:

def score_stock(fdate):
    #B/M, L/A, P/R, capitalization, circulating_cap, circulating_market_cap, market_cap, operating_revenue
    #八個因子越小收益越大,分值越大,應降序排;gross_profit_margin, inc_net_profit_year_on_year, 
    #inc_operation_profit_year_on_year三個因子越大收益越大應順序排
    effective_factors = {'inc_net_profit_year_on_year':True,'gross_profit_margin':True,'inc_operation_profit_year_on_year':True,
                         'B/M':False,'L/A':False,'P/R':False, 'capitalization':False, 'circulating_cap':False,
                        'circulating_market_cap':False, 'market_cap':False, 'operating_revenue':False}
    fdf = get_factors(fdate)
    score = {}
    for fac,value in effective_factors.items():
        score[fac] = fdf[fac].rank(ascending = value,method = 'first')
    print DataFrame(score).T.sum().order(ascending = False).head(5)
    score_stock = list(DataFrame(score).T.sum().order(ascending = False).index)
    return score_stock,fdf['circulating_market_cap']
def get_factors(fdate):
    factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
                     'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']
    stock_set = get_index_stocks('000001.XSHG',fdate)
    q = query(
        valuation.code,
        balance.total_owner_equities/valuation.market_cap/100000000,
        balance.total_liability/balance.total_assets,
        income.net_profit/income.operating_revenue,
        valuation.capitalization,
        valuation.circulating_cap,
        valuation.circulating_market_cap,
        indicator.gross_profit_margin,
        indicator.inc_net_profit_year_on_year,
        indicator.inc_operation_profit_year_on_year,
        valuation.market_cap,
        income.operating_revenue
        ).filter(
        valuation.code.in_(stock_set)
    )
    fdf = get_fundamentals(q,date = fdate)
    fdf.index = fdf['code']
    fdf.columns = ['code'] + factors
    return fdf.iloc[:,-11:]
[score_result,circulating_market_cap] = score_stock('2017-01-01')
code
603859.XSHG    10554
603189.XSHG    10521
600817.XSHG    10451
600385.XSHG    10372
603518.XSHG    10326
dtype: float64

6個組合和benchmark在7年中的月收益率

計算port1-port5以及TOP20和benchmark的月收益率,時間跨度為7×12=84個月,並將所有資料儲存在panel中。

In [18]:

year = ['2011','2012','2013','2014','2015','2016','2017']

month = ['01','02','03','04','05','06','07','08','09','10','11','12']
factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
          'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']
result = {}

for i in range(7*12):

    startdate = year[i/12] + '-' + month[i%12] + '-01'
    try:
        enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'
    except IndexError:
        enddate = '2018-01-01'
    try:
        nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'
    except IndexError:
        if enddate == '2018-01-01':
            nextdate = '2018-02-01'
        else:
            nextdate = '2018-01-01'
    print 'time %s'%startdate
    #綜合11個因子打分後,劃分幾個組合
    df = DataFrame(np.zeros(7),index = ['Top20','port1','port2','port3','port4','port5','benchmark'])
    [score,circulating_market_cap] = score_stock(startdate)
    port0 = score[:20]
    port1 = score[: len(score)/5]
    port2 = score[ len(score)/5+1: 2*len(score)/5]
    port3 = score[ 2*len(score)/5+1: -2*len(score)/5]
    port4 = score[ -2*len(score)/5+1: -len(score)/5]
    port5 = score[ -len(score)/5+1: ]
    print len(score)
 
    df.ix['Top20'] = calculate_port_monthly_return(port0,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port1'] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port2'] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port3'] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port4'] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port5'] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['benchmark'] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)
    result[i+1]=df
    
time 2011-01-01
code
600671.XSHG    8250
600506.XSHG    8065
600365.XSHG    8040
600634.XSHG    7864
600647.XSHG    7843
dtype: float64
867
time 2011-02-01
code
600671.XSHG    8275
600365.XSHG    8059
600506.XSHG    8055
600634.XSHG    7874
600647.XSHG    7855
dtype: float64
867
time 2011-03-01
code
600671.XSHG    8266
600506.XSHG    8034
600365.XSHG    7951
600634.XSHG    7852
600647.XSHG    7842
dtype: float64
866
time 2011-04-01
code
600671.XSHG    8285
600365.XSHG    7943
600634.XSHG    7902
600617.XSHG    7852
600077.XSHG    7834
dtype: float64
874
time 2011-05-01
code
600671.XSHG    8522
600340.XSHG    8239
600365.XSHG    8209
600562.XSHG    8103
600613.XSHG    8097
dtype: float64
885
time 2011-06-01
code
600671.XSHG    8506
600365.XSHG    8221
600149.XSHG    8120
600562.XSHG    8104
600613.XSHG    8104
dtype: float64
885
time 2011-07-01
code
600671.XSHG    8518
600365.XSHG    8240
600149.XSHG    8140
600613.XSHG    8111
600562.XSHG    8098
dtype: float64
885
time 2011-08-01
code
600671.XSHG    8534
600149.XSHG    8126
600613.XSHG    8116
600562.XSHG    8076
600520.XSHG    7937
dtype: float64
886
time 2011-09-01
code
600634.XSHG    8410
600562.XSHG    8198
600671.XSHG    8059
600476.XSHG    7986
600077.XSHG    7970
dtype: float64
901
time 2011-10-01
code
600634.XSHG    8416
600562.XSHG    8113
600671.XSHG    8071
600476.XSHG    8037
600077.XSHG    7963
dtype: float64
902
time 2011-11-01
code
600671.XSHG    8693
600705.XSHG    8048
600421.XSHG    8030
600476.XSHG    8030
600576.XSHG    8006
dtype: float64
913
time 2011-12-01
code
600671.XSHG    8707
600576.XSHG    8080
600705.XSHG    8064
600476.XSHG    8043
600571.XSHG    7970
dtype: float64
913
time 2012-01-01
code
600671.XSHG    8688
600576.XSHG    8088
600705.XSHG    8074
600476.XSHG    8044
600421.XSHG    7984
dtype: float64
913
time 2012-02-01
code
600671.XSHG    8695
600136.XSHG    8190
600576.XSHG    8103
600705.XSHG    8086
600476.XSHG    8068
dtype: float64
913
time 2012-03-01
code
600671.XSHG    8702
600136.XSHG    8178
600576.XSHG    8088
600476.XSHG    8047
600571.XSHG    7994
dtype: float64
912
time 2012-04-01
code
600671.XSHG    8748
600365.XSHG    8250
600576.XSHG    8223
600136.XSHG    8201
600733.XSHG    8149
dtype: float64
914
time 2012-05-01
code
600671.XSHG    8792
600593.XSHG    8544
600562.XSHG    8469
600513.XSHG    8430
600576.XSHG    8395
dtype: float64
920
time 2012-06-01
code
600634.XSHG    8708
600593.XSHG    8620
600513.XSHG    8496
600562.XSHG    8481
600455.XSHG    8228
dtype: float64
922
time 2012-07-01
code
600634.XSHG    8705
600593.XSHG    8637
600562.XSHG    8493
600513.XSHG    8400
600571.XSHG    8239
dtype: float64
922
time 2012-08-01
code
600634.XSHG    8707
600593.XSHG    8636
600562.XSHG    8496
600513.XSHG    8409
600571.XSHG    8249
dtype: float64
922
time 2012-09-01
code
600136.XSHG    9255
600485.XSHG    8874
600733.XSHG    8834
600749.XSHG    8725
600520.XSHG    8476
dtype: float64
933
time 2012-10-01
code
600136.XSHG    9251
600485.XSHG    8875
600733.XSHG    8824
600749.XSHG    8732
600758.XSHG    8475
dtype: float64
933
time 2012-11-01
code
600634.XSHG    9496
600733.XSHG    8811
600365.XSHG    8663
600758.XSHG    8474
600647.XSHG    8473
dtype: float64
940
time 2012-12-01
code
600634.XSHG    9494
600733.XSHG    8859
600365.XSHG    8682
600647.XSHG    8520
600758.XSHG    8480
dtype: float64
940
time 2013-01-01
code
600634.XSHG    9494
600733.XSHG    8849
600365.XSHG    8678
600647.XSHG    8525
600758.XSHG    8484
dtype: float64
940
time 2013-02-01
code
600634.XSHG    9480
600733.XSHG    8821
600647.XSHG    8538
600758.XSHG    8493
600980.XSHG    8458
dtype: float64
940
time 2013-03-01
code
600634.XSHG    9482
600733.XSHG    8832
600647.XSHG    8548
600758.XSHG    8504
600599.XSHG    8498
dtype: float64
942
time 2013-04-01
code
600634.XSHG    9396
600613.XSHG    8620
600985.XSHG    8602
600599.XSHG    8492
600647.XSHG    8442
dtype: float64
942
time 2013-05-01
code
600634.XSHG    9449
600136.XSHG    8910
600980.XSHG    8731
600985.XSHG    8607
600599.XSHG    8545
dtype: float64
942
time 2013-06-01
code
600485.XSHG    9022
600136.XSHG    8892
600980.XSHG    8726
600576.XSHG    8345
600706.XSHG    8332
dtype: float64
941
time 2013-07-01
code
600485.XSHG    9032
600136.XSHG    8902
600980.XSHG    8712
600706.XSHG    8331
600576.XSHG    8318
dtype: float64
941
time 2013-08-01
code
600485.XSHG    9037
600980.XSHG    8705
600576.XSHG    8343
600706.XSHG    8313
600379.XSHG    8302
dtype: float64
941
time 2013-09-01
code
600365.XSHG    8997
600485.XSHG    8938
600980.XSHG    8832
600615.XSHG    8649
600593.XSHG    8545
dtype: float64
941
time 2013-10-01
code
600365.XSHG    8983
600485.XSHG    8922
600980.XSHG    8826
600615.XSHG    8655
600234.XSHG    8566
dtype: float64
941
time 2013-11-01
code
600733.XSHG    8684
600485.XSHG    8457
600758.XSHG    8422
600099.XSHG    8401
600520.XSHG    8390
dtype: float64
941
time 2013-12-01
code
600733.XSHG    8723
600758.XSHG    8423
600520.XSHG    8402
600099.XSHG    8397
600146.XSHG    8356
dtype: float64
941
time 2014-01-01
code
600733.XSHG    8666
600485.XSHG    8421
600758.XSHG    8417
600520.XSHG    8400
600099.XSHG    8391
dtype: float64
941
time 2014-02-01
code
600733.XSHG    8702
600758.XSHG    8421
600146.XSHG    8411
600520.XSHG    8403
600099.XSHG    8393
dtype: float64
941
time 2014-03-01
code
600733.XSHG    8683
600485.XSHG    8460
600758.XSHG    8424
600520.XSHG    8422
600146.XSHG    8392
dtype: float64
941
time 2014-04-01
code
600146.XSHG    8422
600781.XSHG    8411
600506.XSHG    8409
600576.XSHG    8357
600485.XSHG    8354
dtype: float64
944
time 2014-05-01
code
600539.XSHG    9141
600980.XSHG    9020
600753.XSHG    8852
600593.XSHG    8846
600355.XSHG    8760
dtype: float64
948
time 2014-06-01
code
600539.XSHG    9140
600980.XSHG    9039
600753.XSHG    8873
600593.XSHG    8854
600355.XSHG    8765
dtype: float64
948
time 2014-07-01
code
600539.XSHG    9115
600980.XSHG    9006
600753.XSHG    8899
600593.XSHG    8853
600355.XSHG    8729
dtype: float64
947
time 2014-08-01
code
600539.XSHG    9151
600980.XSHG    8984
600593.XSHG    8846
600576.XSHG    8844
600753.XSHG    8838
dtype: float64
947
time 2014-09-01
code
600365.XSHG    8977
600099.XSHG    8765
600355.XSHG    8750
600847.XSHG    8742
600539.XSHG    8677
dtype: float64
951
time 2014-10-01
code
600365.XSHG    8988
600355.XSHG    8806
600099.XSHG    8776
600847.XSHG    8773
600476.XSHG    8696
dtype: float64
951
time 2014-11-01
code
600599.XSHG    9072
600696.XSHG    8995
600419.XSHG    8905
600136.XSHG    8883
600539.XSHG    8838
dtype: float64
968
time 2014-12-01
code
600696.XSHG    9009
600599.XSHG    8950
600419.XSHG    8910
600136.XSHG    8875
600539.XSHG    8836
dtype: float64
969
time 2015-01-01
code
600696.XSHG    9094
600599.XSHG    9039
600136.XSHG    8901
600419.XSHG    8895
600539.XSHG    8755
dtype: float64
969
time 2015-02-01
code
600696.XSHG    9076
600599.XSHG    8999
600419.XSHG    8902
600136.XSHG    8895
600539.XSHG    8756
dtype: float64
969
time 2015-03-01
code
600696.XSHG    9078
600599.XSHG    9007
600419.XSHG    8906
600539.XSHG    8785
600892.XSHG    8737
dtype: float64
969
time 2015-04-01
code
600696.XSHG    9142
600099.XSHG    8952
603601.XSHG    8946
600539.XSHG    8857
600599.XSHG    8817
dtype: float64
982
time 2015-05-01
code
603869.XSHG    9587
603088.XSHG    9461
600455.XSHG    9348
603898.XSHG    9339
603988.XSHG    9335
dtype: float64
1020
time 2015-06-01
code
603869.XSHG    9577
603088.XSHG    9544
603988.XSHG    9415
600455.XSHG    9412
600365.XSHG    9389
dtype: float64
1030
time 2015-07-01
code
603869.XSHG    9757
603088.XSHG    9632
603988.XSHG    9517
600455.XSHG    9494
603636.XSHG    9465
dtype: float64
1039
time 2015-08-01
code
603869.XSHG    9701
603988.XSHG    9515
600365.XSHG    9356
603010.XSHG    9319
600136.XSHG    9305
dtype: float64
1041
time 2015-09-01
code
600506.XSHG    9835
603099.XSHG    9546
600520.XSHG    9501
600593.XSHG    9441
600136.XSHG    9397
dtype: float64
1060
time 2015-10-01
code
600506.XSHG    9834
603099.XSHG    9563
600520.XSHG    9541
600593.XSHG    9476
600365.XSHG    9389
dtype: float64
1060
time 2015-11-01
code
603918.XSHG    9637
600980.XSHG    9520
600599.XSHG    9420
603601.XSHG    9391
600371.XSHG    9374
dtype: float64
1060
time 2015-12-01
code
600980.XSHG    9522
600753.XSHG    9475
603918.XSHG    9472
603010.XSHG    9364
600599.XSHG    9322
dtype: float64
1060
time 2016-01-01
code
603918.XSHG    9641
600980.XSHG    9549
600753.XSHG    9509
600599.XSHG    9438
603601.XSHG    9389
dtype: float64
1066
time 2016-02-01
code
603918.XSHG    9725
603778.XSHG    9652
600599.XSHG    9615
600980.XSHG    9538
603085.XSHG    9419
dtype: float64
1071
time 2016-03-01
code
603918.XSHG    9743
603778.XSHG    9706
600599.XSHG    9683
600980.XSHG    9576
600419.XSHG    9429
dtype: float64
1073
time 2016-04-01
code
600599.XSHG    9913
600419.XSHG    9801
603778.XSHG    9739
600080.XSHG    9710
603918.XSHG    9669
dtype: float64
1078
time 2016-05-01
code
603601.XSHG    9916
603918.XSHG    9907
600137.XSHG    9836
600733.XSHG    9693
603023.XSHG    9673
dtype: float64
1080
time 2016-06-01
code
600137.XSHG    9964
600733.XSHG    9869
603601.XSHG    9766
600506.XSHG    9756
603023.XSHG    9724
dtype: float64
1088
time 2016-07-01
code
600137.XSHG    10035
600733.XSHG     9957
600506.XSHG     9864
603601.XSHG     9716
603066.XSHG     9699
dtype: float64
1096
time 2016-08-01
code
600137.XSHG    10049
603322.XSHG     9969
603601.XSHG     9892
600506.XSHG     9862
600733.XSHG     9801
dtype: float64
1100
time 2016-09-01
code
600455.XSHG    10155
600980.XSHG     9933
603088.XSHG     9885
603027.XSHG     9881
603838.XSHG     9849
dtype: float64
1114
time 2016-10-01
code
600455.XSHG    10177
600980.XSHG    10053
603027.XSHG     9976
603088.XSHG     9970
603779.XSHG     9969
dtype: float64
1123
time 2016-11-01
code
603859.XSHG    10604
600817.XSHG    10441
603779.XSHG    10403
603189.XSHG    10400
600385.XSHG    10387
dtype: float64
1130
time 2016-12-01
code
603859.XSHG    10599
600817.XSHG    10443
603189.XSHG    10410
603779.XSHG    10400
600385.XSHG    10391
dtype: float64
1130
time 2017-01-01
code
603859.XSHG    10554
603189.XSHG    10521
600817.XSHG    10451
600385.XSHG    10372
603518.XSHG    10326
dtype: float64
1130
time 2017-02-01
code
603189.XSHG    10618
603859.XSHG    10489
600817.XSHG    10474
600385.XSHG    10409
603779.XSHG    10399
dtype: float64
1131
time 2017-03-01
code
603189.XSHG    10638
600817.XSHG    10488
603859.XSHG    10467
603779.XSHG    10438
600385.XSHG    10420
dtype: float64
1131
time 2017-04-01
code
603189.XSHG    10792
603779.XSHG    10609
600385.XSHG    10587
603022.XSHG    10441
603088.XSHG    10438
dtype: float64
1152
time 2017-05-01
code
603088.XSHG    11346
603903.XSHG    11275
603960.XSHG    11187
603040.XSHG    11168
603319.XSHG    11143
dtype: float64
1240
time 2017-06-01
code
603088.XSHG    11410
603040.XSHG    11337
603903.XSHG    11331
603960.XSHG    11255
603966.XSHG    11254
dtype: float64
1245
time 2017-07-01
code
603088.XSHG    11429
603903.XSHG    11410
603040.XSHG    11369
603966.XSHG    11275
603960.XSHG    11264
dtype: float64
1246
time 2017-08-01
code
603903.XSHG    11545
603088.XSHG    11454
603040.XSHG    11379
603960.XSHG    11310
603966.XSHG    11286
dtype: float64
1248
time 2017-09-01
code
603040.XSHG    11983
600455.XSHG    11890
603326.XSHG    11672
603429.XSHG    11576
603229.XSHG    11490
dtype: float64
1309
time 2017-10-01
code
603040.XSHG    12019
600455.XSHG    11897
603326.XSHG    11673
600506.XSHG    11525
603229.XSHG    11497
dtype: float64
1309
time 2017-11-01
code
603960.XSHG    12511
603232.XSHG    12503
603859.XSHG    12377
603383.XSHG    12297
603500.XSHG    12238
dtype: float64
1352
time 2017-12-01
code
603232.XSHG    12533
603960.XSHG    12437
603859.XSHG    12353
603500.XSHG    12288
603040.XSHG    12275
dtype: float64
1352

In [19]:

df = pd.Panel(result)

繪製六個組合的月超額收益率

In [20]:

matplotlib.rcParams['axes.unicode_minus']=False
index = ['Top20','port1','port2','port3','port4','port5']
def draw_backtest_picture(ind):
    plt.figure(figsize =(10,4))
    plt.plot(df.ix[:,ind,0]-df.ix[:,'benchmark',0], label = 'excess return: %s'%ind)
    plt.xlabel('backtest excess return of factor %s'%ind)
    plt.legend(loc=0)
    grid()
    
for ind in index:
    draw_backtest_picture(ind)