1. 程式人生 > >11.採集手機端app企查查上司公司資料

11.採集手機端app企查查上司公司資料

---恢復內容開始---

採集企查查手機端app資料:

1.首先手機端安裝app並usb連線電腦端,fiddler監控手機請求資料對資料進行分析抓取。

手機端介面與fiddler介面參照:

 

 
 
 
 

 

2.對獲取到url進行分析

試採集當前頁面資訊:

 

3.分析動態載入需要請求的引數及進一步深度url
https://appv3.qichacha.net/app/v1/other/IPOCompanySearch?searchKey=&sign=bbdb1ed793cb244e4bfb4b9b120984ce383940b0&sortField=date&isSortAsc=false&token=NmM2ZjA3M2Q5ZGU4NDAwM2JmNGQwYWFlMTM1YmVlYzg%3D&timestamp=1541741269760&from=h5&pageIndex=1&platform=other
https://appv3.qichacha.net/app/v1/other/IPOCompanySearch?searchKey=&sign=bbdb1ed793cb244e4bfb4b9b120984ce383940b0&sortField=date&isSortAsc=false&token=NmM2ZjA3M2Q5ZGU4NDAwM2JmNGQwYWFlMTM1YmVlYzg%3D&timestamp=1541741269760&from=h5&pageIndex=2&platform=other
https://appv3.qichacha.net/app/v1/other/IPOCompanySearch?searchKey=&sign=bbdb1ed793cb244e4bfb4b9b120984ce383940b0&sortField=date&isSortAsc=false&token=NmM2ZjA3M2Q5ZGU4NDAwM2JmNGQwYWFlMTM1YmVlYzg%3D&timestamp=1541741269760&from=h5&pageIndex=3&platform=other 
https://appv3.qichacha.net/app/v1/other/IPOCompanySearch?searchKey=&sign=bbdb1ed793cb244e4bfb4b9b120984ce383940b0&sortField=date&isSortAsc=false&token=NmM2ZjA3M2Q5ZGU4NDAwM2JmNGQwYWFlMTM1YmVlYzg%3D&timestamp=1541741269760&from=h5&pageIndex=4&platform=other 
https://appv3.qichacha.net/app/v1/other/IPOCompanySearch?searchKey=&sign=bbdb1ed793cb244e4bfb4b9b120984ce383940b0&sortField=date&isSortAsc=false&token=NmM2ZjA3M2Q5ZGU4NDAwM2JmNGQwYWFlMTM1YmVlYzg%3D&timestamp=1541741269760&from=h5&pageIndex=5&platform=other

可以明顯看出滑動載入資料url是有規律的變化的:
pageIndex=1,2,3,4,5

手機端滑動載入,每次載入20條,pageIndex+1,其他引數保持不變。

但是這裡只給訪問了3572條資料就不再給資料返回了,而且不設定休眠還會被檢測到異常請求。
import requests
import time,random

def main():
    headers = {
        # 將Fiddler右上方的內容填在headers中
        "Host": "appv3.qichacha.net",
        "Connection": "keep-alive",
        "Pragma": "
no-cache", "Cache-Control": "no-cache", "Accept": "application/json,text/javascript,*/*;q=0.01", "Origin": "https://share.qichacha.com", "User-Agent":"Mozilla/5.0 (Linux; Android 7.1.2; MI 5X Build/N2G47H; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/64.0.3282.137 Mobile Safari/537.36
", "Referer": "https://share.qichacha.com/pro/app_11.6.0/enterprise-library/search-ipo/index.html", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,en-US;q=0.9", "X-Requested-With": "com.android.icredit", } for i in range(1,251): url = "http://appv3.qichacha.net/app/v1/other/IPOCompanySearch?searchKey=&sign=c1db45756855fb049b8b8f43b699db2148f9c048&sortField=date&isSortAsc=false&token=NmM2ZjA3M2Q5ZGU4NDAwM2JmNGQwYWFlMTM1YmVlYzg%3D&timestamp=1541739365501&from=h5&pageIndex={}&platform=other".format(i) # 表顯示在json格式下 time.sleep(random.randint(1,2)) res = requests.get(url=url, headers=headers).json() Results = (res['result'])['Result'] # print(Results #獲取當前頁面20條資料 for result in Results: KeyNo = result['KeyNo'] print(KeyNo) Desc = result['Desc'] print(Desc) ShowDate =result['ShowDate'] print(ShowDate) ID = result['ID'] print(ID) CategoryName = result['CategoryName'] print(CategoryName) StockType = result['StockType'] print(StockType) StockMarket = result['StockMarket'] print(StockMarket) ListingMarket = result['ListingMarket'] print(ListingMarket) Title = result['Title'] print(Title) Status =result['Status'] print(Status) StockName = result['StockName'] print(StockName) ImageUrl = result['ImageUrl'] print(ImageUrl) StockNumber = result['StockNumber'] print(StockNumber) CompanyName = result['CompanyName'] print(CompanyName) ListingDate = result['ListingDate'] print(ListingDate) print('*'*100) # 以追加的方式及開啟一個檔案,檔案指標放在檔案結尾,追加讀寫! with open('text', 'a', encoding='utf-8')as f: f.write('\n'.join([KeyNo, Desc, ShowDate, CategoryName, StockType,StockMarket,ListingMarket,Title,Status,StockName,ImageUrl,StockNumber,CompanyName,ListingDate])) f.write('\n' + '=' * 50 + '\n') if __name__ == "__main__": main()
採集情況:
採集 53580/15=3572條資料,能拿到的資料只有這些。

上市公司資料 3572條,而且進入詳情url,app是不給返回介面的,fiddler抓不到包,所以資料就沒辦法拿到,這個資料就抓取不到。

其他的資料就沒有給返回結果的,只能放棄了採集另尋其他方法。