1. 程式人生 > >java實現爬蟲,爬取網易歌單資訊

java實現爬蟲,爬取網易歌單資訊

之前一直對爬蟲很好奇,覺得它很神祕,而我有個朋友是做爬蟲的,最近有空就向他學習了一下,並試著寫了個小程式。 首先是獲得httpclient物件及httpresponse物件,此兩者是用於傳送請求及接受資料。
	CloseableHttpClient httpClient = null;
	CloseableHttpResponse httpResponse = null;
	try {
		RequestConfig requestConfig = RequestConfig.custom().setConnectTimeout(10000).setSocketTimeout(10000)
					.setConnectionRequestTimeout(10000).build();
		httpClient = HttpClients.createDefault();
	}
 
  
 
  
然後是配置請求,去獲得網站裡的資料。
	HttpGet httpGet = new HttpGet("http://music.163.com/discover/toplist?id=3778678");
	httpGet.setConfig(requestConfig);

	httpGet.setHeader("Host", "music.163.com");
	httpGet.setHeader("Referer", "http://music.163.com/");
	httpGet.setHeader("User-Agent",
	"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36");
上面包括url,請求頭,代理等等,封裝進httpget物件中。
	httpResponse = httpClient.execute(httpGet);
	String musicName = EntityUtils.toString(httpResponse.getEntity(), "UTF-8");
	logger.info(musicName);
執行該請求,通過 http. util.EntityUtils把請求的資料轉為string,這裡把它寫進日誌檔案裡。下面是抓取的資料資訊,可以看到歌名等以及網頁的資訊也出來了。 後面步驟需要對此資料進行解析,畢竟要的只是排行榜資訊。
areastyle="display:none;">[{
	"copyrightId": 14026,
	"mvid": 0,
	"transNames": null,
	"status": 0,
	"ftype": 0,
	"privilege": {
		"st": 0,
		"flag": 0,
		"subp": 1,
		"fl": 320000,
		"fee": 0,
		"dl": 320000,
		"cp": 1,
		"cs": false,
		"toast": false,
		"maxbr": 999000,
		"id": 515803379,
		"pl": 320000,
		"sp": 7,
		"payed": 0
	},
	"djid": 0,
	"album": {
		"id": 36681200,
		"name": "別",
		"picUrl": "http://p1.music.126.net/NUUQurj2vr85-ugkwORjWQ==/109951163052989882.jpg",
		"tns": [],
		"pic_str": "109951163052989882",
		"pic": 109951163052989882
	},
	"artists": [{
		"id": 5781,
		"name": "薛之謙",
		"tns": [],
		"alias": []
	}],
	"no": 0,
	"alias": [],
	"score": 100.0,
	"commentThreadId": "R_SO_4_515803379",
	"fee": 0,
	"name": "別",
	"id": 515803379,
	"type": 0,
	"duration": 215664
},