1. 程式人生 > >Python爬蟲:抓取手機APP的數據

Python爬蟲:抓取手機APP的數據

sig ner ont sele ebo span fail pytho 抓取

摘要: 大多數APP裏面返回的是json格式數據,或者一堆加密過的數據 。這裏以超級課程表APP為例,抓取超級課程表裏用戶發的話題。

1、抓取APP數據包

方法詳細可以參考這篇博文:http://my.oschina.net/jhao104/blog/605963

得到超級課程表登錄的地址:http://120.55.151.61/V2/StudentSkip/loginCheckV4.action

表單:技術分享圖片

表單中包括了用戶名和密碼,當然都是加密過了的,還有一個設備信息,直接post過去就是。

另外必須加header,一開始我沒有加header得到的是登錄錯誤,所以要帶上header信息。

技術分享圖片


2、登錄

登錄代碼:

#python學習交流群:125240963
import urllib2
from cookielib import CookieJar
loginUrl = http://120.55.151.61/V2/StudentSkip/loginCheckV4.action
headers = {
    Content-Type: application/x-www-form-urlencoded; charset=UTF-8,
    User-Agent: Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)
, Host: 120.55.151.61, Connection: Keep-Alive, Accept-Encoding: gzip, Content-Length: 207, } loginData = phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&
cookieJar = CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar)) req = urllib2.Request(loginUrl, loginData, headers) loginResult = opener.open(req).read() print loginResult

登錄成功 會返回一串賬號信息的json數據

技術分享圖片

和抓包時返回數據一樣,證明登錄成功

技術分享圖片


3、抓取數據

用同樣方法得到話題的url和post參數

做法就和模擬登錄網站一樣。詳見:http://my.oschina.net/jhao104/blog/547311

下見最終代碼,有主頁獲取和下拉加載更新。可以無限加載話題內容。

#!/usr/local/bin/python2.7
# -*- coding: utf8 -*-
"""
  超級課程表話題抓取
"""
import urllib2
from cookielib import CookieJar
import json


‘‘‘ 讀Json數據 ‘‘‘
def fetch_data(json_data):
    data = json_data[data]
    timestampLong = data[timestampLong]
    messageBO = data[messageBOs]
    topicList = []
    for each in messageBO:
        topicDict = {}
        if each.get(content, False):
            topicDict[content] = each[content]
            topicDict[schoolName] = each[schoolName]
            topicDict[messageId] = each[messageId]
            topicDict[gender] = each[studentBO][gender]
            topicDict[time] = each[issueTime]
            print each[schoolName],each[content]
            topicList.append(topicDict)
    return timestampLong, topicList


‘‘‘ 加載更多 ‘‘‘
def load(timestamp, headers, url):
    headers[Content-Length] = 159
    loadData = timestamp=%s&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1& % timestamp
    req = urllib2.Request(url, loadData, headers)
    loadResult = opener.open(req).read()
    loginStatus = json.loads(loadResult).get(status, False)
    if loginStatus == 1:
        print load successful!
        timestamp, topicList = fetch_data(json.loads(loadResult))
        load(timestamp, headers, url)
    else:
        print load fail
        print loadResult
        return False

loginUrl = http://120.55.151.61/V2/StudentSkip/loginCheckV4.action
topicUrl = http://120.55.151.61/V2/Treehole/Message/getMessageByTopicIdV3.action
headers = {
    Content-Type: application/x-www-form-urlencoded; charset=UTF-8,
    User-Agent: Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H),
    Host: 120.55.151.61,
    Connection: Keep-Alive,
    Accept-Encoding: gzip,
    Content-Length: 207,
    }

‘‘‘ ---登錄部分--- ‘‘‘
loginData = phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&
cookieJar = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
req = urllib2.Request(loginUrl, loginData, headers)
loginResult = opener.open(req).read()
loginStatus = json.loads(loginResult).get(data, False)
if loginResult:
    print login successful!
else:
    print login fail
    print loginResult

‘‘‘ ---獲取話題--- ‘‘‘
topicData = timestamp=0&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&
headers[Content-Length] = 147
topicRequest = urllib2.Request(topicUrl, topicData, headers)
topicHtml = opener.open(topicRequest).read()
topicJson = json.loads(topicHtml)
topicStatus = topicJson.get(status, False)
print topicJson
if topicStatus == 1:
    print fetch topic success!
    timestamp, topicList = fetch_data(topicJson)
    load(timestamp, headers, topicUrl)

結果:

技術分享圖片

python學習交流群:125240963

轉載請註明來源:http://my.oschina.net/jhao104/blog/606922

Python爬蟲:抓取手機APP的數據