爬蟲-微信公眾平臺訊息獲取
阿新 • • 發佈:2019-01-31
幫朋友抓取微信公眾平臺的使用者評論資訊。
下面只說核心的部分,怎麼獲取評論資訊。
檢視HTML程式碼,沒有發現關於評論部分的標籤。看來是用JS動態生成的,但是查詢ajax請求也沒有找到哪裡有返回資料。
最後搜尋一下,原來是在這裡,很直白的寫在了JS裡:
<script type="text/javascript"> wx.cgiData = { total_count : 91, latest_msg_id : '200325222', count : "20"*1 || 20, day : "7", frommsgid : "", can_search_msg : "1", offset : "", action : "", keyword : "", list : ({"msg_item":[{"id":200322761,"type":1,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854675,"content":"記得幫我查一下是不是這個電話!","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322760,"type":2,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854664,"source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322759,"type":1,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854659,"content":"勐璇,我看到那人了!","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322344,"type":2,"fakeid":"1994400010","nick_name":"ABC的CBA","date_time":1398839849,"source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200321209,"type":1,"fakeid":"1591078101","nick_name":"倚(紡織服裝)","date_time":1398788906,"content":"\/::<","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200321206,"type":2,"fakeid":"1591078101","nick_name":"倚(紡織服裝)","date_time":1398788859,"source":"","msg_status":4,"has_reply":1,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},
用的是JSON格式,程式碼太亂,放在Eclipse裡格式化一下,訊息列表大概就是這個樣了:
{"msg_item" :[ { "id" : 200322761, "type" : 1, "fakeid" : "593656935", "nick_name" : "Suang 1", "date_time" : 1398854675, "content" : "記得幫我查一下是不是這個電話!", "source" : "", "msg_status" : 4, "has_reply" : 0, "refuse_reason" : "", "multi_item" : [], "to_uin" : 3071594631, "send_stat" : { "total" : 0, "succ" : 0, "fail" : 0 } }, { "id" : 200322760, "type" : 2, "fakeid" : "593656935", "nick_name" : "Suang 1", "date_time" : 1398854664, "source" : "", "msg_status" : 4, "has_reply" : 0, "refuse_reason" : "", "multi_item" : [], "to_uin" : 3071594631, "send_stat" : { "total" : 0, "succ" : 0, "fail" : 0 } } ] }
上面就是 json字串 中 msg_item 所對應的列表裡的物件。
可以看出這個是一個數組,每個評論是裡面的一個物件。怎麼生成對於的Java類呢 ?
這裡有一個線上的工具:http://jsongen.byingtondesign.com/
可以根據JSON 字串,生成對應的java類:
類1
類2。部分欄位沒有用,刪掉了import java.util.List; public class MessageList{ private List<Message> msg_item; public List<Message> getMsg_item() { return msg_item; } public void setMsg_item(List<Message> msgItem) { msg_item = msgItem; } }
public class Message {
private String content;
private long date_time;
private String fakeid;
private int has_reply;
private long id;
private int msg_status;
private String nick_name;
private String refuse_reason;
private String source;
private long to_uin;
private int type;
// get set 略去
}
下面來做個測試。用google的 Gson 來進行處理,把json字串解析為 java物件。
//jsonstr 為 msg_item 的json字串
MessageList msgList = new Gson().fromJson(jsonstr, MessageList.class);
System.out.println(msgList.getMsg_item().size());
解析成功。所有的物件都在 msgList裡了