1. 程式人生 > >python爬取json介面

python爬取json介面

在某大佬的指導下,接觸第一個爬蟲,這裡是爬取了一個網站的介面資料並且儲存成Excel文件,程式碼不多,重在認識爬蟲

Python寫爬蟲真的是爽

#! /usr/bin/env python # coding=utf-8 import requests import json import sys import xlwt ses = requests.session() # # requests是與傳送請求有關的,xlwt模組是建立、讀寫Excel檔案,sys實現從程式外部向程式傳遞引數 reload(sys)#需要重新載入模組,以防止上一次呼叫過模組導致報錯 sys.setdefaultencoding('utf8') def http_get(url):     return json.loads(requests.get(url,stream=True).content) workbook = xlwt.Workbook(encoding='utf-8') worksheet = workbook.add_sheet('Worksheet') row=0 for a in range(0,1):   #這句程式碼是迴圈 0=<a<1 ,所有實際上就是輸出當a=0是寫入url,可有可無               #http://180.153.255.6/mobile/discovery/v2/category/metadata/albums/ts-1515757942203?calcDimension=hot&categoryId=0&device=android&pageId=1&pageSize=100&version=6.3.60               url="http://180.153.255.6/mobile/discovery/v2/category/metadata/albums/ts-1515757942203?calcDimension=hot&categoryId=0&device=android&pageId="+str(a)+"&pageSize=100&version=6.3.60"               json_data=http_get(url)                              if json_data["list"]==[]:                    break               else:                      b=len(json_data["list"])                      for bb in range(0,b):                             special_list=[]                             tracks_list=[]                             albumId=json_data["list"][bb]["albumId"]                             sku_url="http://180.153.255.6/mobile/v1/album/ts-1515829937763?ac=WIFI&albumId="+str(albumId)+"&device=android&isAsc=true&pageId=1&pageSize=1&pre_page=2&source=5&supportWebp=true"                             json_sku_data=http_get(sku_url)                             if json_sku_data["data"]!="":                                    try:                                            user_id=json_sku_data["data"]["user"]["uid"]                                    except:                                            user_id=""                                    print albumId,                                    print user_id,                                    print json_data["list"][bb]["title"]                                    list = [albumId,json_data["list"][bb]["title"],json_data["list"][bb]["nickname"],json_data["list"][bb]["intro"]]                             for p in range(0, 4):                                 worksheet.write(row, p, label=list[p])                             row=row+1 print row-1  #從 0 開始,這裡應該直接輸出 row 就可以顯示多少條資料了 workbook.save('Excel_Workbook.xls')

 需要注意的是,reload(sys) 是重新載入sys模組,如果不載入模組會報錯

好了,一個簡單的爬蟲就做出來了,這個爬蟲主要是爬取json介面的資料