股票OHLC歷史資料爬取——Yahoo

雅虎 · 發表 2018-11-29 17:23:55

摘要： OHLC 指 open，high，low，close，老外的網站資料規範，相比從國內的網站獲取股票、場內基金的資料，yahoo更可靠，JSON的資料結構也使得獲取資料更方便、準確。 Yahoo API 過去python的pandas中直接提供了yahoo、google等資料的介面，pa...

OHLC 指 open，high，low，close，老外的網站資料規範，相比從國內的網站獲取股票、場內基金的資料，yahoo更可靠，JSON的資料結構也使得獲取資料更方便、準確。

Yahoo API

過去python的pandas中直接提供了yahoo、google等資料的介面，pandas.io.data，在《Python金融大資料分析》中有詳細介紹，現在該介面已經移除，網上也有一些第三方的API，但也已經很久沒有維護，失效了。

如果資料量不大，對頻率要求不高，可以考慮直接從Yahoo網頁直接提取資料。

API 介面

ofollow,noindex">https://finance.yahoo.com/quote/510300.SS/history?period1=1511924498&period2=1543460498&interval=1d&filter=history&frequency=1d

510300.SS：股票程式碼

1511924498：起始日時間戳

1543460498：截止日時間戳

1d：頻率，日

事先的處理

日期轉化為時間戳：

start_date = int(dt.datetime.strptime(start_date, "%Y-%m-%d").timestamp())
end_date = int(dt.datetime.strptime(end_date, "%Y-%m-%d").timestamp())

程式碼轉換

tick_suffix_dict = {'SH':'SS',
'SZ':'SZ',
'HK':'HK'}

freq 問題的處理

freq_dict = {"D":"1d", # 日
"w":"1wk", # 周
"m":"1mo" # 月}

url = "https://finance.yahoo.com/quote/{}/history?period1={}&period2={}&interval={}&filter=history&frequency={}".\
format(tickCode,start_date,end_date,frequency,frequency)

獲取JSON資料

# 從網頁上獲取JSON資料
response = requests.get(url)
soup = BeautifulSoup(response.content,"lxml" )

script_json = json.loads(soup.find_all('script')[-3].text.split("\n")[5][16:-1])
prices_json = script_json['context']['dispatcher']['stores']['HistoricalPriceStore']['prices']

輸出DataFrame

prices = pd.DataFrame(prices_json)
 prices['date'] = prices['date'].apply(lambda x: dt.date.fromtimestamp(x))
 prices.set_index('date',inplace = True)
 prices.sort_index(inplace = True)

結果示例

ohlc_hist('510300.SH','2018-1-1','2018-11-27','m')

image.png