股票交易日定時爬取上交所/深交所所有股票行情數據存儲到數據庫
阿新 • • 發佈:2018-07-06
prim bubuko urn 數據存儲 ont 交易 info mon 深圳
一、該項目主要分以下三步組成:
- 配置數據庫信息
- 編寫爬蟲腳本
- 配置Jenkins定時任務
- 查看采集結果
二、詳細過程
1.配置數據庫信息
建表語句, 以其中部分字段為例:
CREATE TABLE `stockmarket` ( `date` varchar(12) NOT NULL DEFAULT ‘‘ COMMENT ‘時間‘, `stockCode` varchar(100) NOT NULL DEFAULT ‘‘ COMMENT ‘股票代碼‘, `stockName` varchar(100) DEFAULT NULL COMMENT ‘股票名字‘, `close` decimal(19,2) DEFAULT NULL COMMENT ‘閉市價‘, `high` decimal(19,2) DEFAULT NULL COMMENT ‘最高‘, `low` decimal(19,2) DEFAULT NULL COMMENT ‘最低‘, `amplitudeRatio` decimal(19,2) DEFAULT NULL COMMENT ‘振幅‘, `turnoverRatio` decimal(19,2) DEFAULT NULL COMMENT ‘換手率‘, `preClose` decimal(19,2) DEFAULT NULL COMMENT ‘昨收‘, `open` decimal(19,2) DEFAULT NULL COMMENT ‘開盤價‘, PRIMARY KEY (`date`,`stockCode`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
配置json數據到.json文件, 用於讀取配置信息,進行數據庫連接
"stockMarket":{ "host":"localhost", "port":3326, "user":"root", "password":"password", "database":"stockMarket", "charset":"utf8" }
2.腳本編寫
涉及到的python庫
import re,pymysql,json,time,requests
代碼編寫
#!/usr/bin/env python # -*- coding: utf-8 -*- # @Author : Torre Yang Edit with Python3.6 # @Email : [email protected] # @Time : 2018/6/28 10:50 # 定時 爬取每日股票行情數據; # 股票數據內容: import getSoup import pymysql import os import re import json import requestsimport connect_dataBase import time # db連接 connectDB = connect_dataBase.ConnectDatabase() get_conf = connectDB.get_conf(‘databases_conf.json‘) conn, cur = connectDB.connect_db(get_conf["stockMarket"]["host"], get_conf["stockMarket"]["user"], get_conf["stockMarket"]["password"], get_conf["stockMarket"]["database"], get_conf["stockMarket"]["port"]) # 第一步, 通過東方財富網 獲取 上海/深圳 所有股票的 股票代碼, 存儲到list中 url = ‘http://quote.eastmoney.com/stocklist.html#‘ soup = getSoup.getSoup(url) uls = soup.select(‘div#quotesearch li‘) # 正則表達式獲取所有的股票代碼 re1 = re.compile(r‘href="http://quote.eastmoney.com/(.+?).html"‘) stockCodes = re1.findall(str(uls)) # print(stockCodes) # 第二步, 將股票代碼加入到 股票搜索 的網址中 stockValues = [] for stockCode in stockCodes: # url = ‘https://gupiao.baidu.com/stock/‘+stockCode+‘.html‘ url = ‘https://gupiao.baidu.com/api/rails/stockbasicbatch?from=pc&os_ver=1&cuid=xxx&vv=100&format=json&stock_code=‘+stockCode+‘‘ # print(url) # url = ‘https://gupiao.baidu.com/api/rails/stockbasicbatch?from=pc&os_ver=1&cuid=xxx&vv=100&format=json&stock_code=sh201003‘ response = requests.get(url) response.raise_for_status() res = response.content try: JsonDatas = json.loads(res, encoding=‘utf-8‘) except: print(‘解析為空‘) datas = JsonDatas[‘data‘] ) for data in datas: # 添加當天日期(交易日) date = time.strftime("%Y-%m-%d", time.localtime()) stockCode = data[‘stockCode‘] stockName = data[‘stockName‘] close = data[‘close‘] high = data[‘high‘] low = data[‘low‘] amplitudeRatio = data[‘amplitudeRatio‘] turnoverRatio = data[‘turnoverRatio‘] preClose = data[‘preClose‘] open = data[‘open‘] sql = ‘insert into stockmarket(date,stockCode,stockName,close,high,low,amplitudeRatio,turnoverRatio,preClose,open)values("‘+str(date)+‘","‘+str(stockCode)+‘","‘+str(stockName)+‘","‘+str(close)+‘","‘+str(high)+‘","‘+str(low)+‘","‘+str(amplitudeRatio)+‘","‘+str(turnoverRatio)+‘","‘+str(preClose)+‘","‘+str(open)+‘")‘ print(sql) if ‘None‘ in sql: print(‘jump this data‘) else: try: connectDB.get_fetch(conn, cur, sql) except: print(‘數據異常, 跳過‘) print(‘采集數據完畢‘)
3.配置Jenkins
遠程ssh配置,配置定時任務(tip:建議晚上進行采集(或閉市時間),因為交易時間,股票的數據在動態變化)
Jenkins> 系統配置>ssh remote hosts (我是裝的虛擬機,centos7版本,已經配置好了JDK,python3,mysql,tomcat等常用軟件服務)
4.驗證結果
源碼地址:https://github.com/Testworm/stockMarket.git
股票交易日定時爬取上交所/深交所所有股票行情數據存儲到數據庫