1. 程式人生 > >Python爬蟲---爬取股票資訊

Python爬蟲---爬取股票資訊

最近開了個股票賬戶,爬取一下300和600開頭的股票資訊,來篩選股票

僅僅爬取資訊,不做排序和分析

程式碼地址

包含的庫

import requests
from bs4 import BeautifulSoup
import traceback
import re

獲取網頁原始碼資訊

def getHTMLText(url, code="utf-8"):
    try:
        r = requests.get(url)
        r.raise_for_status()
        r.encoding = code
        return r.text
    except
: return ""

所有股票中選擇300或者600開頭的股票加入列表

def getStockList(lst, stockURL):
    html = getHTMLText(stockURL, "GB2312")
    soup = BeautifulSoup(html, 'html.parser') 
    a = soup.find_all('a')
    for i in a:
        try:
            href = i.attrs['href']
            lst.append(re.findall(r"[s][hz][36]\d{5}"
, href)[0]) except: continue

獲得股票的詳細資訊

這裡選擇了股票公司的總市值、淨資產、淨利潤、市盈率、市淨率、毛利率、淨利率和ROE進行爬取,並儲存到檔案中

def getStockInfo(lst, stockURL, fpath):
    Listtitle=['名稱','總市值','淨資產','淨利潤','市盈率','市淨率','毛利率','淨利率','ROE']
    with open(fpath,'w',encoding='utf-8') as f:
        for i in range(len
(Listtitle)): f.write("{0:<10}\t".format(Listtitle[i],chr(12288))) count = 0 for stock in lst: url = stockURL + stock + ".html" html = getHTMLText(url,"GB2312") try: if html=="": continue List=[] soup = BeautifulSoup(html, 'html.parser') stock = soup.find('div',attrs={'class':'cwzb'}).find_all('tbody')[0] name=stock.find_all('b')[0] List.append(name.text) keyList = stock.find_all('td')[1:9] for i in range(len(keyList)): List.append(keyList[i].text) with open(fpath,'a',encoding='utf-8') as f: f.write('\n') for i in range(len(List)): f.write('{0:<10}\t'.format(List[i],chr(12288))) count = count + 1 print("\r當前進度: {:.2f}%".format(count*100/len(lst)),end="") except: count = count + 1 print("\r當前進度: {:.2f}%".format(count*100/len(lst)),end="") continue

主函式呼叫

def main():
    stock_list_url = 'http://quote.eastmoney.com/stocklist.html'
    stock_info_url = 'http://quote.eastmoney.com/'
    output_file = './Stock.txt'
    slist=[]
    getStockList(slist, stock_list_url)
    getStockInfo(slist, stock_info_url, output_file)