1. 程式人生 > >量化策略多因子選股之抓取不同時段的歷史價格,及資料庫操作

量化策略多因子選股之抓取不同時段的歷史價格,及資料庫操作

#獲取20170831,20180323的滬深三百成分及中證500的成分股當日收盤價

# coding=utf-8
import pandas as pd
import tushare as ts
import sys
from sqlalchemy import create_engine
import numpy as np

reload(sys)
sys.setdefaultencoding("utf-8")
hs300=ts.get_hs300s()

code=[]
price=[]

hs300_code= hs300['code']
for stock in hs300_code:
    try:
        df=ts.get_k_data(stock,ktype
='M',start='2017-08-01', end='2018-03-24') #print df df1= df.iloc[0,6] df2= df.iloc[0,2] #print df1 #print df2 code.append(df1) price.append(df2) except: pass print code print len(code) print price print len(price) price_201709=pd.DataFrame({'price'
:price},index=code ) print price_201709 engine = create_engine('mysql://root:********@127.0.0.1/stock_report?charset=utf8') price_201709.to_sql('price_201709',engine,if_exists='append')

然後在MYSQL進行表連線,將需要的財務資料(2017年第二季度)、歷史股價連線

計算市值 (總市值和流通市值)


計算PE


CREATE TABLE final

SELECT p.index, b.name, p.price AS 20170901price, b.`outstanding`, b.`totals`

, esp, p.price * outstanding AS marketvalue, totals * p.price AS totalvalue
, p.price / esp AS pe, industry, g1.roe
, g1.`net_profits`, g1.`profits_yoy`, g.`net_profit_ratio`, g.`gross_profit_rate`, g.business_income
, `mbrg`, `nprg`, p2.price AS 20180323price
FROM `price_201709` p, `get_report_basic_nochange` b, `get_report2_201706` g, `get_report_201706` g1, `get_report3_201706` g3, `price_20180328` p2
WHERE p.index = b.code
AND g.code = p.index
AND g1.code = p.index
AND g3.code = p.index
AND p2.index = p.index

瀏覽結果:


接下來就是對部分財務資料進行取對數,檢查資料的缺失值。。。