1. 程式人生 > >java爬蟲,破解JS加密的Cookie

java爬蟲,破解JS加密的Cookie

一 序:

因為爬取資料需要,代理跟驗證碼識別屬於不可避免的問題。本文總結了下因為爬取免費代理IP資料遇到的js加密cookie問題。

二 問題:

對於常見的靜態頁面來說,jsoup的解析是比較常見的。


但是這個網站如果直接用jsoup去抓取,會報錯。

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=521, URL=http://www.kuaidaili.com/ops/proxylist/1
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:679)
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:628)
	at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:260)
	at org.jsoup.helper.HttpConnection.get(HttpConnection.java:249)

三 問題分析及解決:

其實瀏覽器是能正常瀏覽的,我們開啟瀏覽器看下過程。以Chrome的為例子



可以明顯看到第一次是報錯的,HTTP狀態碼是521,不是200.

     第二次是200,但是第二次的cookie多了一些:_ydclearance=a3fd46bd1a232b52d7313218-72dc-4427-aa33-5690668af31d-1506323606

這就是問題的原因了。

用程式來模擬下,列印下引數:

			CloseableHttpClient httpClient = HttpClients.createDefault();
			
			Registry<CookieSpecProvider> cookieSpecProviderRegistry = RegistryBuilder.<CookieSpecProvider>create()
	                .register("myCookieSpec", context -> new MyCookieSpec()).build();//註冊自定義CookieSpec
			String url = baseUrl + i;
			HttpGet get = new HttpGet(url);
			HttpClientContext context = HttpClientContext.create();
			 context.setCookieSpecRegistry(cookieSpecProviderRegistry);
			 get.setConfig(RequestConfig.custom().setCookieSpec("myCookieSpec").build());
			 WebRequest request = null;
			 WebClient wc = null;
			try {
				//1、獲取521狀態時返回setcookie
				CloseableHttpResponse response = httpClient.execute(get, context);
				// 響應狀態
				System.out.println("status:" + response.getStatusLine());
				System.out.println(">>>>>>headers:");
				HeaderIterator iterator = response.headerIterator();
				while (iterator.hasNext()) {
					System.out.println("\t" + iterator.next());
				}
				System.out.println(">>>>>>cookies:");
				// context.getCookieStore().getCookies().forEach(System.out::println);
				String cookie =getCookie(context);
				System.out.println("cookie="+cookie);
				response.close();

輸出日誌:
status:HTTP/1.1 521 
>>>>>>headers:
	Date: Mon, 25 Sep 2017 07:10:25 GMT
	Content-Type: text/html
	Connection: keep-alive
	Set-Cookie: yd_cookie=fa424be4-70a9-4478226851c4b3f3e8e031e4ed7860052980; Expires=1506330625; Path=/; HttpOnly
	Cache-Control: no-cache, no-store
	Server: WAF/2.4-12.1
>>>>>>cookies:
cookie=yd_cookie=fa424be4-70a9-4478226851c4b3f3e8e031e4ed7860052980;Expires=Mon Sep 25 17:10:25 CST 2017;Path=/
TM真是套路深啊,不愧為是代理爬蟲網站,發爬蟲更有一套。

拿到這個cookie,再呼叫一次目標URL。看看反饋資料:

				HttpGet secGet = new HttpGet(url);
				secGet.setHeader("Cookie",cookie);
				//測試用,對比獲取結果
				CloseableHttpResponse secResponse = httpClient.execute(secGet, context);
				System.out.println("secstatus:" + secResponse.getStatusLine());
				String content = EntityUtils.toString(secResponse.getEntity());
	            System.out.println(content);
	            secResponse.close();
我以為就返回正常結果資料了,結果是我太傻了,返回了一段JS,而且是加密過的
<html><body><script language="javascript"> 
window.onload=setTimeout("dv(43)", 200); function dv(VC) {var qo, mo="", no="", oo = [0x43,0xe5,0xb0,0x27,0x71,0x6f,0xe9,0x58,0xd8,0x21,0x55,0x56,0xd0,0x4f,0xcd,0x91,0x1c,0x9e,0x09,0xe7,0x80,0x6f,0x8d,0xf3,0x60,0x73,0xe9,0x66,0xd4,0x47,0x1e,0x76,0xec,0x69,0xe3,0xbc,0x27,0x02,0x70,0xe0,0xf2,0x65,0xd1,0xac,0x19,0xf3,0x6c,0xe4,0x57,0x3c,0xc3,0xa8,0x13,0xe1,0xb4,0x37,0x0e,0xf2,0x5f,0x32,0x43,0x0e,0x88,0xfe,0xd7,0x99,0x68,0xe0,0xdb,0xa6,0xd2,0xab,0x80,0x57,0x52,0xd6,0xa3,0x7c,0x5d,0xd5,0x29,0x28,0xa0,0x75,0x48,0x3d,0x18,0x13,0xdd,0x4a,0x21,0xf1,0xbe,0x89,0x70,0x72,0xe0,0x4b,0x16,0xee,0xa7,0x74,0x41,0x40,0x13,0xb3,0x7e,0x53,0x24,0xfa,0x12,0xec,0xbd,0x8e,0x5b,0x37,0x02,0xec,0xdd,0x4c,0xf8,0x59,0xad,0x34,0x8c,0x93,0xfd,0x58,0x33,0x6d,0xc6,0x45,0xc5,0xc2,0xb7,0xac,0x81,0x50,0x4b,0x61,0x98,0x03,0x57,0x52,0x25,0x8d,0x60,0x51,0x26,0x09,0xa2,0x8b,0x5e,0x33,0x1c,0xaa,0x77,0x42,0x37,0x65,0x35,0x73,0x7f,0x66,0x5b,0xae,0x1b,0x99,0x18,0x8a,0x18,0x9a,0x1b,0xf5,0xf6,0x66,0xf0,0x3b,0xad,0x34,0x50,0xbc,0x2f,0xb1,0x2e,0xd9,0x60,0x5d,0xd7,0x56,0x5f,0xd9,0xc4,0xb5,0x0a,0xf9,0x70,0xbc,0x3d,0x1c,0xae,0xad,0xa0,0x87,0x7c,0x47,0x95,0x1c,0x9c,0x05,0x45,0xc7,0x16,0x17,0x83,0x33,0xb1,0x2c,0x76,0xf4,0x2e,0x9c,0x19,0x65,0x66,0x3d,0xb9,0x3c,0xb2,0x25,0xbd,0x0a,0x90,0x0f,0x8f,0xce,0xa9,0x16,0x98,0x0f,0xc5,0x14,0x8e,0xfc,0x79,0xe1,0x2e,0x2f,0x39,0x51,0x42,0x94,0x3b];qo = "qo=251; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>2)|((oo[qo]<<6)&0xff))-180)&0xff;} while(--qo>=2);"; eval(qo);qo = 250; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 250) break; oo[qo] = ((((((oo[qo] + 162) & 0xff) + 32) & 0xff) << 1) & 0xff) | (((((oo[qo] + 162) & 0xff) + 32) & 0xff) >> 7); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 5) po += String.fromCharCode(oo[qo] ^ VC);eval("qo=eval;qo(po);");} </script> </body></html>

好吧,我承認js不行。搞不懂其中的細節。

但是我們有web控制檯,我們可以看看這個js是執行出啥結果

document.cookie='_ydclearance=efad3fbba88e7738d15cc25b-5203-428b-b273-5d6459ee5246-1506330645; expires=Mon, 25-Sep-17 09:10:45 GMT; domain=.kuaidaili.com; path=/'; window.document.location=document.URL


這個資料就是js加密後的cookie,也是我們想要的資料。

上面的js大概意思就是生成新的cookie,頁面重新重新整理一下。eval("qo=eval;qo(po);這句話是賦值的。針對瀏覽器起作用。

也就是說我們如果能拿到這個新cookie,就能是jsoup重新發揮作用,獲取資料。

至此,這個網站的反爬蟲策略已經清楚了:

1. 返回狀態521,帶有ydcookie

2. ydcookie在訪問會返回js.

3.執行js會返回加密後的_ydclearance,帶著加密後的cookie會能正常訪問。

那麼關鍵就是如何執行這個js了。httpclient及jsoup是不滿足業務需求了,Python有PyV8來做JavaScript引擎,那麼Java採用啥好呢。

Google一下,網上常見的HtmlUnit/Selenium/PhantomJs三類流行的js渲染引擎。

Seleninum :需要安裝瀏覽器,結合相應的WebDriver,正確配置webdriver的路徑引數
優點:支援複雜功能。缺點:效能差一些,不適合作為伺服器,更是小部分資料除錯
HtmlUnit/:基於HTTPclient封裝的,無瀏覽器介面,效能一般。解析js差。

PhantomJs :看介紹比較主流,效能尚可。值得深入學習。(目前我也沒用過)

其實這些引擎估計學習起來都有成本,官網的資料不太全或者自己自己看不太懂,需要自己在除錯過程中結合API去搞明白。

說白了就是要不斷的踩坑,才知道哦原來是這麼回事。

***********************************我是一個分割線*********************************************************

廢話說完了,週末時間不多,找個學習代價小的HTMLUNIT去嘗試。

 URL link=new URL(url); 	   
	             request=new WebRequest(link); 
	            request.setCharset("UTF-8");	 
	            request.setAdditionalHeader("Referer", "http://www.kuaidaili.com/ops/proxylist/1/");//設定請求報文頭裡的refer欄位
	            ////設定請求報文頭裡的User-Agent欄位
	            request.setAdditionalHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/38.0");
	            wc = new WebClient();
	            wc.getCookieManager().setCookiesEnabled(true);//開啟cookie管理  
	            wc.getOptions().setJavaScriptEnabled(true);//開啟js解析。對於變態網頁,這個是必須的  
	            wc.getOptions().setCssEnabled(false);//關閉css解析。
	            wc.getOptions().setThrowExceptionOnFailingStatusCode(false);  
	            wc.getOptions().setThrowExceptionOnScriptError(false);	          
	            wc.getOptions().setRedirectEnabled(true);
	            wc.getOptions().setTimeout(10000);
	            wc.setJavaScriptTimeout(50000);
	            //設定cookie
	            List<Cookie> cList = context.getCookieStore().getCookies();	
	            if(cList != null && !cList.isEmpty())
	            {	
	            	  
	            	  for(int j=0;j<cList.size();j++) 
	                  {  
	            		  Cookie ck = cList.get(j);
	            		    //cookie
	            			com.gargoylesoftware.htmlunit.util.Cookie ucookies=new com.gargoylesoftware.htmlunit.util.Cookie("www.kuaidaili.com", ck.getName() , ck.getValue());  	            			         		  
	            			wc.getCookieManager().addCookie(ucookies); 
	                  }	  
	            }
				//獲取動態跳轉頁					
				HtmlPage hp  = wc.getPage(request);				
				String body = hp.getBody().asXml();
				//擷取js函式
				String function = body.substring(body.indexOf("function"),body.lastIndexOf("}")+1);
				//把設定cookie的函式替換成返回
				function = function.replace("eval(\"qo=eval;qo(po);\")", "return po");		
				System.out.println("function:"+function);
				//提取js函式引數(正則表示式匹配)
				String regx ="setTimeout\\(\\\"\\D+\\((\\d+)\\)\"";
				Pattern pattern = Pattern.compile(regx); 	
			    Matcher matcher = pattern.matcher(body); 
			    String call = "";
			    String args ="";
			    while(matcher.find()){ 
			    	String group =matcher.group();
			    	 args = group.substring(group.lastIndexOf("(")+1,group.indexOf(")"));
			    	 call = group.substring(group.indexOf("\"")+1,group.lastIndexOf("\""));
			    	System.out.println(args+":"+call);
			    }				
				// 執行js獲取加密後的資料

後面執行js就能獲取資料: ScriptResult  ckstr =hp.executeJavaScript


我們用獲取的加密的cookie,傳給jsoup。這就能正常訪問資料了。

以測試為例http://www.kuaidaili.com/ops/proxylist/1,2兩個頁面為例

日誌如下:

status:HTTP/1.1 521 
>>>>>>headers:
	Date: Mon, 25 Sep 2017 07:10:25 GMT
	Content-Type: text/html
	Connection: keep-alive
	Set-Cookie: yd_cookie=fa424be4-70a9-4478226851c4b3f3e8e031e4ed7860052980; Expires=1506330625; Path=/; HttpOnly
	Cache-Control: no-cache, no-store
	Server: WAF/2.4-12.1
>>>>>>cookies:
cookie=yd_cookie=fa424be4-70a9-4478226851c4b3f3e8e031e4ed7860052980;Expires=Mon Sep 25 17:10:25 CST 2017;Path=/
15:10:27.023 [main] INFO  c.g.htmlunit.WebClient - statusCode=[521] contentType=[text/html]
15:10:27.033 [main] INFO  c.g.htmlunit.WebClient - <html><body><script language="javascript"> window.onload=setTimeout("kt(180)", 200); function kt(OD) {var qo, mo="", no="", oo = [0xe7,0xd1,0x34,0xe1,0x60,0xfd,0x51,0x2f,0xc4,0x2b,0xc2,0x90,0xb5,0x43,0xf0,0x7e,0xfb,0xd9,0x22,0x42,0x32,0x5f,0x5d,0x23,0xe6,0xb4,0x5a,0x38,0xf5,0x2c,0xd6,0x94,0x2a,0xf7,0xd5,0xf5,0x99,0xe9,0x52,0x92,0x68,0x46,0x45,0x4d,0x03,0x53,0x8b,0x61,0xc6,0x2f,0x0d,0x45,0x13,0xe0,0xc7,0x08,0x80,0x80,0xb8,0x9e,0xf2,0x53,0x53,0xa3,0x81,0x21,0xdc,0xb2,0x88,0xc8,0x01,0xc0,0x68,0xd0,0x29,0x61,0xd1,0x71,0x01,0xde,0x1f,0xdc,0x1d,0xbc,0x04,0x6c,0xa4,0xec,0x35,0x3d,0x67,0xcf,0x28,0xf5,0xab,0xe3,0x08,0x78,0x4e,0xed,0x2e,0x8e,0x34,0x7c,0xd4,0x05,0x55,0x9d,0x32,0x8a,0xc2,0x3b,0x2b,0xf2,0x89,0x67,0x6d,0xb3,0x31,0x67,0x78,0x56,0x84,0xa4,0x41,0xee,0x7e,0x14,0xbb,0x63,0xbb,0x1c,0xd4,0x74,0xa1,0x9f,0xc5,0x65,0x25,0x65,0xd5,0x9d,0xc5,0xc5,0xa4,0xbc,0xfc,0x25,0x3d,0x75,0x50,0xa8,0x70,0x5d,0xf9,0x5f,0x37,0x27,0xee,0xd4,0x62,0x20,0xe7,0xa5,0x23,0xd8,0xf8,0x90,0xdd,0x4b,0xa9,0x67,0xe4,0xca,0xdd,0x9b,0x19,0xbe,0x3c,0xd3,0x9f,0x4d,0xfa,0x98,0x88,0x50,0x14,0x5a,0x18,0x7e,0xe3,0x04,0x0a,0xb9,0x89,0x99,0x41,0xaf,0x98,0x16,0xab,0x91,0x3f,0x8d,0x21,0xd8,0xbe,0x4c,0x1a,0x78,0x47,0xe4,0xc2,0x78,0xde,0x76,0xd2,0x78,0x06,0xd3,0x91,0xf7,0xf0,0x6e,0x1c,0xb1,0xd1,0xb7,0x3e,0xcb,0x99,0xf7,0x95,0x73,0x6e,0x04,0x6a,0x02,0x7f,0xb4,0x22,0x87,0x3b];qo = "qo=241; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>5)|((oo[qo]<<3)&0xff))-211)&0xff;} while(--qo>=2);"; eval(qo);qo = 240; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 240) break; oo[qo] = ((((((oo[qo] + 44) & 0xff) + 55) & 0xff) << 2) & 0xff) | (((((oo[qo] + 44) & 0xff) + 55) & 0xff) >> 6); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 6) po += String.fromCharCode(oo[qo] ^ OD);eval("qo=eval;qo(po);");} </script> </body></html>
function:function kt(OD) {var qo, mo="", no="", oo = [0xe7,0xd1,0x34,0xe1,0x60,0xfd,0x51,0x2f,0xc4,0x2b,0xc2,0x90,0xb5,0x43,0xf0,0x7e,0xfb,0xd9,0x22,0x42,0x32,0x5f,0x5d,0x23,0xe6,0xb4,0x5a,0x38,0xf5,0x2c,0xd6,0x94,0x2a,0xf7,0xd5,0xf5,0x99,0xe9,0x52,0x92,0x68,0x46,0x45,0x4d,0x03,0x53,0x8b,0x61,0xc6,0x2f,0x0d,0x45,0x13,0xe0,0xc7,0x08,0x80,0x80,0xb8,0x9e,0xf2,0x53,0x53,0xa3,0x81,0x21,0xdc,0xb2,0x88,0xc8,0x01,0xc0,0x68,0xd0,0x29,0x61,0xd1,0x71,0x01,0xde,0x1f,0xdc,0x1d,0xbc,0x04,0x6c,0xa4,0xec,0x35,0x3d,0x67,0xcf,0x28,0xf5,0xab,0xe3,0x08,0x78,0x4e,0xed,0x2e,0x8e,0x34,0x7c,0xd4,0x05,0x55,0x9d,0x32,0x8a,0xc2,0x3b,0x2b,0xf2,0x89,0x67,0x6d,0xb3,0x31,0x67,0x78,0x56,0x84,0xa4,0x41,0xee,0x7e,0x14,0xbb,0x63,0xbb,0x1c,0xd4,0x74,0xa1,0x9f,0xc5,0x65,0x25,0x65,0xd5,0x9d,0xc5,0xc5,0xa4,0xbc,0xfc,0x25,0x3d,0x75,0x50,0xa8,0x70,0x5d,0xf9,0x5f,0x37,0x27,0xee,0xd4,0x62,0x20,0xe7,0xa5,0x23,0xd8,0xf8,0x90,0xdd,0x4b,0xa9,0x67,0xe4,0xca,0xdd,0x9b,0x19,0xbe,0x3c,0xd3,0x9f,0x4d,0xfa,0x98,0x88,0x50,0x14,0x5a,0x18,0x7e,0xe3,0x04,0x0a,0xb9,0x89,0x99,0x41,0xaf,0x98,0x16,0xab,0x91,0x3f,0x8d,0x21,0xd8,0xbe,0x4c,0x1a,0x78,0x47,0xe4,0xc2,0x78,0xde,0x76,0xd2,0x78,0x06,0xd3,0x91,0xf7,0xf0,0x6e,0x1c,0xb1,0xd1,0xb7,0x3e,0xcb,0x99,0xf7,0x95,0x73,0x6e,0x04,0x6a,0x02,0x7f,0xb4,0x22,0x87,0x3b];qo = "qo=241; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>5)|((oo[qo]<<3)&0xff))-211)&0xff;} while(--qo>=2);"; eval(qo);qo = 240; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 240) break; oo[qo] = ((((((oo[qo] + 44) & 0xff) + 55) & 0xff) << 2) & 0xff) | (((((oo[qo] + 44) & 0xff) + 55) & 0xff) >> 6); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 6) po += String.fromCharCode(oo[qo] ^ OD);return po;}
180:kt(180)
ScriptResult[result=document.cookie='_ydclearance=741fe8b32f4e2cc1692d597e-ff12-4627-a1a1-4200846cb27f-1506330626; expires=Mon, 25-Sep-17 09:10:26 GMT; domain=.kuaidaili.com; path=/'; window.document.location=document.URL page=HtmlPage(http://www.kuaidaili.com/ops/proxylist/1)@1667534569]
_ydclearance=741fe8b32f4e2cc1692d597e-ff12-4627-a1a1-4200846cb27f-1506330626; expires=Mon, 25-Sep-17 09:10:26 GMT; domain=.kuaidaili.com; path=/
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:110.72.151.94port=8123
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:59.62.42.71port=808
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:121.232.147.106port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:182.129.241.103port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:117.158.1.210port=9797
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:182.129.241.48port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:121.232.146.181port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:117.90.5.104port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:122.96.59.99port=80
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:117.90.4.39port=9000
status:HTTP/1.1 521 
>>>>>>headers:
	Date: Mon, 25 Sep 2017 07:10:45 GMT
	Content-Type: text/html
	Connection: keep-alive
	Set-Cookie: yd_cookie=bec9e152-9fef-4e1d7f68f1a0fe7c3550f7195af4f1450737; Expires=1506330645; Path=/; HttpOnly
	Cache-Control: no-cache, no-store
	Server: WAF/2.4-12.1
>>>>>>cookies:
cookie=yd_cookie=bec9e152-9fef-4e1d7f68f1a0fe7c3550f7195af4f1450737;Expires=Mon Sep 25 17:10:45 CST 2017;Path=/
15:10:46.176 [main] INFO  c.g.htmlunit.WebClient - statusCode=[521] contentType=[text/html]
15:10:46.176 [main] INFO  c.g.htmlunit.WebClient - <html><body><script language="javascript"> window.onload=setTimeout("dv(43)", 200); function dv(VC) {var qo, mo="", no="", oo = [0x43,0xe5,0xb0,0x27,0x71,0x6f,0xe9,0x58,0xd8,0x21,0x55,0x56,0xd0,0x4f,0xcd,0x91,0x1c,0x9e,0x09,0xe7,0x80,0x6f,0x8d,0xf3,0x60,0x73,0xe9,0x66,0xd4,0x47,0x1e,0x76,0xec,0x69,0xe3,0xbc,0x27,0x02,0x70,0xe0,0xf2,0x65,0xd1,0xac,0x19,0xf3,0x6c,0xe4,0x57,0x3c,0xc3,0xa8,0x13,0xe1,0xb4,0x37,0x0e,0xf2,0x5f,0x32,0x43,0x0e,0x88,0xfe,0xd7,0x99,0x68,0xe0,0xdb,0xa6,0xd2,0xab,0x80,0x57,0x52,0xd6,0xa3,0x7c,0x5d,0xd5,0x29,0x28,0xa0,0x75,0x48,0x3d,0x18,0x13,0xdd,0x4a,0x21,0xf1,0xbe,0x89,0x70,0x72,0xe0,0x4b,0x16,0xee,0xa7,0x74,0x41,0x40,0x13,0xb3,0x7e,0x53,0x24,0xfa,0x12,0xec,0xbd,0x8e,0x5b,0x37,0x02,0xec,0xdd,0x4c,0xf8,0x59,0xad,0x34,0x8c,0x93,0xfd,0x58,0x33,0x6d,0xc6,0x45,0xc5,0xc2,0xb7,0xac,0x81,0x50,0x4b,0x61,0x98,0x03,0x57,0x52,0x25,0x8d,0x60,0x51,0x26,0x09,0xa2,0x8b,0x5e,0x33,0x1c,0xaa,0x77,0x42,0x37,0x65,0x35,0x73,0x7f,0x66,0x5b,0xae,0x1b,0x99,0x18,0x8a,0x18,0x9a,0x1b,0xf5,0xf6,0x66,0xf0,0x3b,0xad,0x34,0x50,0xbc,0x2f,0xb1,0x2e,0xd9,0x60,0x5d,0xd7,0x56,0x5f,0xd9,0xc4,0xb5,0x0a,0xf9,0x70,0xbc,0x3d,0x1c,0xae,0xad,0xa0,0x87,0x7c,0x47,0x95,0x1c,0x9c,0x05,0x45,0xc7,0x16,0x17,0x83,0x33,0xb1,0x2c,0x76,0xf4,0x2e,0x9c,0x19,0x65,0x66,0x3d,0xb9,0x3c,0xb2,0x25,0xbd,0x0a,0x90,0x0f,0x8f,0xce,0xa9,0x16,0x98,0x0f,0xc5,0x14,0x8e,0xfc,0x79,0xe1,0x2e,0x2f,0x39,0x51,0x42,0x94,0x3b];qo = "qo=251; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>2)|((oo[qo]<<6)&0xff))-180)&0xff;} while(--qo>=2);"; eval(qo);qo = 250; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 250) break; oo[qo] = ((((((oo[qo] + 162) & 0xff) + 32) & 0xff) << 1) & 0xff) | (((((oo[qo] + 162) & 0xff) + 32) & 0xff) >> 7); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 5) po += String.fromCharCode(oo[qo] ^ VC);eval("qo=eval;qo(po);");} </script> </body></html>
function:function dv(VC) {var qo, mo="", no="", oo = [0x43,0xe5,0xb0,0x27,0x71,0x6f,0xe9,0x58,0xd8,0x21,0x55,0x56,0xd0,0x4f,0xcd,0x91,0x1c,0x9e,0x09,0xe7,0x80,0x6f,0x8d,0xf3,0x60,0x73,0xe9,0x66,0xd4,0x47,0x1e,0x76,0xec,0x69,0xe3,0xbc,0x27,0x02,0x70,0xe0,0xf2,0x65,0xd1,0xac,0x19,0xf3,0x6c,0xe4,0x57,0x3c,0xc3,0xa8,0x13,0xe1,0xb4,0x37,0x0e,0xf2,0x5f,0x32,0x43,0x0e,0x88,0xfe,0xd7,0x99,0x68,0xe0,0xdb,0xa6,0xd2,0xab,0x80,0x57,0x52,0xd6,0xa3,0x7c,0x5d,0xd5,0x29,0x28,0xa0,0x75,0x48,0x3d,0x18,0x13,0xdd,0x4a,0x21,0xf1,0xbe,0x89,0x70,0x72,0xe0,0x4b,0x16,0xee,0xa7,0x74,0x41,0x40,0x13,0xb3,0x7e,0x53,0x24,0xfa,0x12,0xec,0xbd,0x8e,0x5b,0x37,0x02,0xec,0xdd,0x4c,0xf8,0x59,0xad,0x34,0x8c,0x93,0xfd,0x58,0x33,0x6d,0xc6,0x45,0xc5,0xc2,0xb7,0xac,0x81,0x50,0x4b,0x61,0x98,0x03,0x57,0x52,0x25,0x8d,0x60,0x51,0x26,0x09,0xa2,0x8b,0x5e,0x33,0x1c,0xaa,0x77,0x42,0x37,0x65,0x35,0x73,0x7f,0x66,0x5b,0xae,0x1b,0x99,0x18,0x8a,0x18,0x9a,0x1b,0xf5,0xf6,0x66,0xf0,0x3b,0xad,0x34,0x50,0xbc,0x2f,0xb1,0x2e,0xd9,0x60,0x5d,0xd7,0x56,0x5f,0xd9,0xc4,0xb5,0x0a,0xf9,0x70,0xbc,0x3d,0x1c,0xae,0xad,0xa0,0x87,0x7c,0x47,0x95,0x1c,0x9c,0x05,0x45,0xc7,0x16,0x17,0x83,0x33,0xb1,0x2c,0x76,0xf4,0x2e,0x9c,0x19,0x65,0x66,0x3d,0xb9,0x3c,0xb2,0x25,0xbd,0x0a,0x90,0x0f,0x8f,0xce,0xa9,0x16,0x98,0x0f,0xc5,0x14,0x8e,0xfc,0x79,0xe1,0x2e,0x2f,0x39,0x51,0x42,0x94,0x3b];qo = "qo=251; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>2)|((oo[qo]<<6)&0xff))-180)&0xff;} while(--qo>=2);"; eval(qo);qo = 250; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 250) break; oo[qo] = ((((((oo[qo] + 162) & 0xff) + 32) & 0xff) << 1) & 0xff) | (((((oo[qo] + 162) & 0xff) + 32) & 0xff) >> 7); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 5) po += String.fromCharCode(oo[qo] ^ VC);return po;}
43:dv(43)
ScriptResult[result=document.cookie='_ydclearance=efad3fbba88e7738d15cc25b-5203-428b-b273-5d6459ee5246-1506330645; expires=Mon, 25-Sep-17 09:10:45 GMT; domain=.kuaidaili.com; path=/'; window.document.location=document.URL page=HtmlPage(http://www.kuaidaili.com/ops/proxylist/2)@1410367298]
_ydclearance=efad3fbba88e7738d15cc25b-5203-428b-b273-5d6459ee5246-1506330645; expires=Mon, 25-Sep-17 09:10:45 GMT; domain=.kuaidaili.com; path=/
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:117.90.4.39port=9000
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:122.193.14.85port=83
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:118.117.137.188port=9000
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:163.125.66.201port=9797
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:60.178.170.141port=8081
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:120.199.64.163port=8081
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:182.92.207.196port=80
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:111.62.243.64port=80
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:121.232.144.221port=9000
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:116.214.32.51port=8080
proxy list size=0 

三 結論:

能獲取資料,但是快代理的免費資料質量太差了。兩頁每頁10個沒有一個是有效的。

從經濟的角度:獲取代理ip的成本不太值得,這是週末在家除錯的。要是業務需要的話,還是找靠譜的去買比較好。

四 除錯過程中踩得坑:

1 htmlunit 新版本2.27有bug

java.lang.ClassCastException: java.lang.Integer cannot be cast to net.sourceforge.htmlunit.corejs.javascript.Function
	at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.getEventHandler(EventListenersContainer.java:343)
	at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeEventHandler(EventListenersContainer.java:280)
	at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:309)
	at com.gargoylesoftware.htmlunit.javascript.host.event.EventTarget.fireEvent(EventTarget.java:201)
	at com.gargoylesoftware.htmlunit.html.DomElement$2.run(DomElement.java:1375)
	at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:637)
	at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:518)
	at com.gargoylesoftware.htmlunit.html.DomElement.fireEvent(DomElement.java:1380)
	at com.gargoylesoftware.htmlunit.html.HtmlPage.executeEventHandlersIfNeeded(HtmlPage.java:1208)
	at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:289)
	at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:529)
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:396)
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:313)
	at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:478)
getPage報錯,後續的除錯無法進行,短時間來不及去看原始碼所以暫時放棄。用2.26或更以前版本

2. 缺包:如果使用2.24

Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
解決辦法:引入
<dependency>  
            <groupId>xml-apis</groupId>  
            <artifactId>xml-apis</artifactId>  
            <version>1.4.01</version>  
        </dependency>

或者嘗試2.26

3.ScriptResult[result=null

沒結果,注js函式拼

。。。