1. 程式人生 > >爬取網易財經全部A股上市公司年報

爬取網易財經全部A股上市公司年報

首先要找到所有A股上市公司的股票程式碼,將東方財富網列表中所有的股票的程式碼(6位數字號)取下來

<a target="_blank" href="http://quote.eastmoney.com/sh500001.html">基金金泰(500001)</a>

從網頁中找到我們要的資訊,儲存在字典中,寫入"stock_name.txt"檔案

# -*- coding: utf-8 -*-
"""
Created on Tue Oct  9 00:03:46 2018

@author: South
"""

import requests
import time
import sys
import json
import os 
def get_file(url, filename):    
    r = requests.get(url)    
    try:
        with open(filename, 'wb') as file:        
            file.write(r.content)
    except:
        print(filename)
        pass

def check_file(filename):    
    '''檢查有沒有被反爬'''    
    if os.path.exists(filename):        
        with open(filename, 'r') as f:            
            line = f.readline()            
            if 'Doc' in line:                
                return False            
            else:                
                return True    
    else:        
        return False 

def check_item(num):    
    '''檢查檔案是否下載完整'''    
    zcfzb = './data/zcfzb/' + num + '.csv'    
    lrb = './data/lrb/' + num + '.csv'    
    xjllb = './data/xjllb/' + num + '.csv'    
    if check_file(zcfzb) == False | check_file(lrb) == False | check_file(xjllb) == False:        
        return False    
    else:        
        return True 

f = open('stock_name.txt', 'r')
stockdict = json.loads(f.read())
f.close()
count = 0
for num, v in stockdict.items():    
    count = count + 1    
    if count%100  == 0:        
        print(int(count*100/len(stockdict)), '% completed downloading')    
    #存放檔案的路徑    
    zcfzb = './data/zcfzb/' + num + '.csv'    
    lrb = './data/lrb/' + num + '.csv'    
    xjllb = './data/xjllb/' + num + '.csv'    
    #檔案下載網址
    zcfzb_url = "http://quotes.money.163.com/service/zcfzb_"+ num + ".html?type=year"    
    lrb_url = "http://quotes.money.163.com/service/lrb_"+ num + ".html?type=year"    
    xjllb_url = "http://quotes.money.163.com/service/xjllb_"+ num + ".html?type=year"     
    get_file(zcfzb_url, zcfzb)    
    get_file(lrb_url, lrb)    
    get_file(xjllb_url, xjllb)
    #time.sleep(1)        
    if check_item(num):        
        pass    
    else:        
        print("被反爬了,休息10s")        
        time.sleep(5)

有了股票程式碼就可以去網易財經上下報表了。以貴州茅臺為例,股票程式碼:600519

後得到3654家A股上市公司的三張表