java實現爬蟲,爬取網易歌單資訊
阿新 • • 發佈:2018-11-20
之前一直對爬蟲很好奇,覺得它很神祕,而我有個朋友是做爬蟲的,最近有空就向他學習了一下,並試著寫了個小程式。
首先是獲得httpclient物件及httpresponse物件,此兩者是用於傳送請求及接受資料。
然後是配置請求,去獲得網站裡的資料。
CloseableHttpClient httpClient = null;
CloseableHttpResponse httpResponse = null;
try {
RequestConfig requestConfig = RequestConfig.custom().setConnectTimeout(10000).setSocketTimeout(10000)
.setConnectionRequestTimeout(10000).build();
httpClient = HttpClients.createDefault();
}
然後是配置請求,去獲得網站裡的資料。
上面包括url,請求頭,代理等等,封裝進httpget物件中。HttpGet httpGet = new HttpGet("http://music.163.com/discover/toplist?id=3778678"); httpGet.setConfig(requestConfig); httpGet.setHeader("Host", "music.163.com"); httpGet.setHeader("Referer", "http://music.163.com/"); httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36");
httpResponse = httpClient.execute(httpGet);
String musicName = EntityUtils.toString(httpResponse.getEntity(), "UTF-8");
logger.info(musicName);
執行該請求,通過
http.
util.EntityUtils把請求的資料轉為string,這裡把它寫進日誌檔案裡。下面是抓取的資料資訊,可以看到歌名等以及網頁的資訊也出來了。
後面步驟需要對此資料進行解析,畢竟要的只是排行榜資訊。
areastyle="display:none;">[{ "copyrightId": 14026, "mvid": 0, "transNames": null, "status": 0, "ftype": 0, "privilege": { "st": 0, "flag": 0, "subp": 1, "fl": 320000, "fee": 0, "dl": 320000, "cp": 1, "cs": false, "toast": false, "maxbr": 999000, "id": 515803379, "pl": 320000, "sp": 7, "payed": 0 }, "djid": 0, "album": { "id": 36681200, "name": "別", "picUrl": "http://p1.music.126.net/NUUQurj2vr85-ugkwORjWQ==/109951163052989882.jpg", "tns": [], "pic_str": "109951163052989882", "pic": 109951163052989882 }, "artists": [{ "id": 5781, "name": "薛之謙", "tns": [], "alias": [] }], "no": 0, "alias": [], "score": 100.0, "commentThreadId": "R_SO_4_515803379", "fee": 0, "name": "別", "id": 515803379, "type": 0, "duration": 215664 },