java爬蟲,破解JS加密的Cookie
一 序:
因為爬取資料需要,代理跟驗證碼識別屬於不可避免的問題。本文總結了下因為爬取免費代理IP資料遇到的js加密cookie問題。
二 問題:
對於常見的靜態頁面來說,jsoup的解析是比較常見的。
但是這個網站如果直接用jsoup去抓取,會報錯。
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=521, URL=http://www.kuaidaili.com/ops/proxylist/1 at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:679) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:628) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:260) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:249)
三 問題分析及解決:
其實瀏覽器是能正常瀏覽的,我們開啟瀏覽器看下過程。以Chrome的為例子
可以明顯看到第一次是報錯的,HTTP狀態碼是521,不是200.
第二次是200,但是第二次的cookie多了一些:_ydclearance=a3fd46bd1a232b52d7313218-72dc-4427-aa33-5690668af31d-1506323606
這就是問題的原因了。
用程式來模擬下,列印下引數:
CloseableHttpClient httpClient = HttpClients.createDefault(); Registry<CookieSpecProvider> cookieSpecProviderRegistry = RegistryBuilder.<CookieSpecProvider>create() .register("myCookieSpec", context -> new MyCookieSpec()).build();//註冊自定義CookieSpec String url = baseUrl + i; HttpGet get = new HttpGet(url); HttpClientContext context = HttpClientContext.create(); context.setCookieSpecRegistry(cookieSpecProviderRegistry); get.setConfig(RequestConfig.custom().setCookieSpec("myCookieSpec").build()); WebRequest request = null; WebClient wc = null; try { //1、獲取521狀態時返回setcookie CloseableHttpResponse response = httpClient.execute(get, context); // 響應狀態 System.out.println("status:" + response.getStatusLine()); System.out.println(">>>>>>headers:"); HeaderIterator iterator = response.headerIterator(); while (iterator.hasNext()) { System.out.println("\t" + iterator.next()); } System.out.println(">>>>>>cookies:"); // context.getCookieStore().getCookies().forEach(System.out::println); String cookie =getCookie(context); System.out.println("cookie="+cookie); response.close();
輸出日誌:
TM真是套路深啊,不愧為是代理爬蟲網站,發爬蟲更有一套。status:HTTP/1.1 521 >>>>>>headers: Date: Mon, 25 Sep 2017 07:10:25 GMT Content-Type: text/html Connection: keep-alive Set-Cookie: yd_cookie=fa424be4-70a9-4478226851c4b3f3e8e031e4ed7860052980; Expires=1506330625; Path=/; HttpOnly Cache-Control: no-cache, no-store Server: WAF/2.4-12.1 >>>>>>cookies: cookie=yd_cookie=fa424be4-70a9-4478226851c4b3f3e8e031e4ed7860052980;Expires=Mon Sep 25 17:10:25 CST 2017;Path=/
拿到這個cookie,再呼叫一次目標URL。看看反饋資料:
HttpGet secGet = new HttpGet(url);
secGet.setHeader("Cookie",cookie);
//測試用,對比獲取結果
CloseableHttpResponse secResponse = httpClient.execute(secGet, context);
System.out.println("secstatus:" + secResponse.getStatusLine());
String content = EntityUtils.toString(secResponse.getEntity());
System.out.println(content);
secResponse.close();
我以為就返回正常結果資料了,結果是我太傻了,返回了一段JS,而且是加密過的
<html><body><script language="javascript">
window.onload=setTimeout("dv(43)", 200); function dv(VC) {var qo, mo="", no="", oo = [0x43,0xe5,0xb0,0x27,0x71,0x6f,0xe9,0x58,0xd8,0x21,0x55,0x56,0xd0,0x4f,0xcd,0x91,0x1c,0x9e,0x09,0xe7,0x80,0x6f,0x8d,0xf3,0x60,0x73,0xe9,0x66,0xd4,0x47,0x1e,0x76,0xec,0x69,0xe3,0xbc,0x27,0x02,0x70,0xe0,0xf2,0x65,0xd1,0xac,0x19,0xf3,0x6c,0xe4,0x57,0x3c,0xc3,0xa8,0x13,0xe1,0xb4,0x37,0x0e,0xf2,0x5f,0x32,0x43,0x0e,0x88,0xfe,0xd7,0x99,0x68,0xe0,0xdb,0xa6,0xd2,0xab,0x80,0x57,0x52,0xd6,0xa3,0x7c,0x5d,0xd5,0x29,0x28,0xa0,0x75,0x48,0x3d,0x18,0x13,0xdd,0x4a,0x21,0xf1,0xbe,0x89,0x70,0x72,0xe0,0x4b,0x16,0xee,0xa7,0x74,0x41,0x40,0x13,0xb3,0x7e,0x53,0x24,0xfa,0x12,0xec,0xbd,0x8e,0x5b,0x37,0x02,0xec,0xdd,0x4c,0xf8,0x59,0xad,0x34,0x8c,0x93,0xfd,0x58,0x33,0x6d,0xc6,0x45,0xc5,0xc2,0xb7,0xac,0x81,0x50,0x4b,0x61,0x98,0x03,0x57,0x52,0x25,0x8d,0x60,0x51,0x26,0x09,0xa2,0x8b,0x5e,0x33,0x1c,0xaa,0x77,0x42,0x37,0x65,0x35,0x73,0x7f,0x66,0x5b,0xae,0x1b,0x99,0x18,0x8a,0x18,0x9a,0x1b,0xf5,0xf6,0x66,0xf0,0x3b,0xad,0x34,0x50,0xbc,0x2f,0xb1,0x2e,0xd9,0x60,0x5d,0xd7,0x56,0x5f,0xd9,0xc4,0xb5,0x0a,0xf9,0x70,0xbc,0x3d,0x1c,0xae,0xad,0xa0,0x87,0x7c,0x47,0x95,0x1c,0x9c,0x05,0x45,0xc7,0x16,0x17,0x83,0x33,0xb1,0x2c,0x76,0xf4,0x2e,0x9c,0x19,0x65,0x66,0x3d,0xb9,0x3c,0xb2,0x25,0xbd,0x0a,0x90,0x0f,0x8f,0xce,0xa9,0x16,0x98,0x0f,0xc5,0x14,0x8e,0xfc,0x79,0xe1,0x2e,0x2f,0x39,0x51,0x42,0x94,0x3b];qo = "qo=251; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>2)|((oo[qo]<<6)&0xff))-180)&0xff;} while(--qo>=2);"; eval(qo);qo = 250; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 250) break; oo[qo] = ((((((oo[qo] + 162) & 0xff) + 32) & 0xff) << 1) & 0xff) | (((((oo[qo] + 162) & 0xff) + 32) & 0xff) >> 7); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 5) po += String.fromCharCode(oo[qo] ^ VC);eval("qo=eval;qo(po);");} </script> </body></html>
好吧,我承認js不行。搞不懂其中的細節。
但是我們有web控制檯,我們可以看看這個js是執行出啥結果
document.cookie='_ydclearance=efad3fbba88e7738d15cc25b-5203-428b-b273-5d6459ee5246-1506330645; expires=Mon, 25-Sep-17 09:10:45 GMT; domain=.kuaidaili.com; path=/'; window.document.location=document.URL
這個資料就是js加密後的cookie,也是我們想要的資料。
上面的js大概意思就是生成新的cookie,頁面重新重新整理一下。eval("qo=eval;qo(po);這句話是賦值的。針對瀏覽器起作用。
也就是說我們如果能拿到這個新cookie,就能是jsoup重新發揮作用,獲取資料。
至此,這個網站的反爬蟲策略已經清楚了:
1. 返回狀態521,帶有ydcookie
2. ydcookie在訪問會返回js.
3.執行js會返回加密後的_ydclearance,帶著加密後的cookie會能正常訪問。
那麼關鍵就是如何執行這個js了。httpclient及jsoup是不滿足業務需求了,Python有PyV8來做JavaScript引擎,那麼Java採用啥好呢。
Google一下,網上常見的HtmlUnit/Selenium/PhantomJs三類流行的js渲染引擎。
Seleninum :需要安裝瀏覽器,結合
相應的WebDriver,正確配置webdriver的路徑引數
優點:支援複雜功能。缺點:效能差一些,不適合作為伺服器,更是小部分資料除錯
HtmlUnit/:基於HTTPclient封裝的,無瀏覽器介面,效能一般。解析js差。PhantomJs
:看介紹比較主流,效能尚可。值得深入學習。(目前我也沒用過)
其實這些引擎估計學習起來都有成本,官網的資料不太全或者自己自己看不太懂,需要自己在除錯過程中結合API去搞明白。
說白了就是要不斷的踩坑,才知道哦原來是這麼回事。
***********************************我是一個分割線*********************************************************
廢話說完了,週末時間不多,找個學習代價小的HTMLUNIT去嘗試。
URL link=new URL(url);
request=new WebRequest(link);
request.setCharset("UTF-8");
request.setAdditionalHeader("Referer", "http://www.kuaidaili.com/ops/proxylist/1/");//設定請求報文頭裡的refer欄位
////設定請求報文頭裡的User-Agent欄位
request.setAdditionalHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/38.0");
wc = new WebClient();
wc.getCookieManager().setCookiesEnabled(true);//開啟cookie管理
wc.getOptions().setJavaScriptEnabled(true);//開啟js解析。對於變態網頁,這個是必須的
wc.getOptions().setCssEnabled(false);//關閉css解析。
wc.getOptions().setThrowExceptionOnFailingStatusCode(false);
wc.getOptions().setThrowExceptionOnScriptError(false);
wc.getOptions().setRedirectEnabled(true);
wc.getOptions().setTimeout(10000);
wc.setJavaScriptTimeout(50000);
//設定cookie
List<Cookie> cList = context.getCookieStore().getCookies();
if(cList != null && !cList.isEmpty())
{
for(int j=0;j<cList.size();j++)
{
Cookie ck = cList.get(j);
//cookie
com.gargoylesoftware.htmlunit.util.Cookie ucookies=new com.gargoylesoftware.htmlunit.util.Cookie("www.kuaidaili.com", ck.getName() , ck.getValue());
wc.getCookieManager().addCookie(ucookies);
}
}
//獲取動態跳轉頁
HtmlPage hp = wc.getPage(request);
String body = hp.getBody().asXml();
//擷取js函式
String function = body.substring(body.indexOf("function"),body.lastIndexOf("}")+1);
//把設定cookie的函式替換成返回
function = function.replace("eval(\"qo=eval;qo(po);\")", "return po");
System.out.println("function:"+function);
//提取js函式引數(正則表示式匹配)
String regx ="setTimeout\\(\\\"\\D+\\((\\d+)\\)\"";
Pattern pattern = Pattern.compile(regx);
Matcher matcher = pattern.matcher(body);
String call = "";
String args ="";
while(matcher.find()){
String group =matcher.group();
args = group.substring(group.lastIndexOf("(")+1,group.indexOf(")"));
call = group.substring(group.indexOf("\"")+1,group.lastIndexOf("\""));
System.out.println(args+":"+call);
}
// 執行js獲取加密後的資料
後面執行js就能獲取資料: ScriptResult ckstr =hp.executeJavaScript
我們用獲取的加密的cookie,傳給jsoup。這就能正常訪問資料了。
以測試為例http://www.kuaidaili.com/ops/proxylist/1,2兩個頁面為例
日誌如下:
status:HTTP/1.1 521
>>>>>>headers:
Date: Mon, 25 Sep 2017 07:10:25 GMT
Content-Type: text/html
Connection: keep-alive
Set-Cookie: yd_cookie=fa424be4-70a9-4478226851c4b3f3e8e031e4ed7860052980; Expires=1506330625; Path=/; HttpOnly
Cache-Control: no-cache, no-store
Server: WAF/2.4-12.1
>>>>>>cookies:
cookie=yd_cookie=fa424be4-70a9-4478226851c4b3f3e8e031e4ed7860052980;Expires=Mon Sep 25 17:10:25 CST 2017;Path=/
15:10:27.023 [main] INFO c.g.htmlunit.WebClient - statusCode=[521] contentType=[text/html]
15:10:27.033 [main] INFO c.g.htmlunit.WebClient - <html><body><script language="javascript"> window.onload=setTimeout("kt(180)", 200); function kt(OD) {var qo, mo="", no="", oo = [0xe7,0xd1,0x34,0xe1,0x60,0xfd,0x51,0x2f,0xc4,0x2b,0xc2,0x90,0xb5,0x43,0xf0,0x7e,0xfb,0xd9,0x22,0x42,0x32,0x5f,0x5d,0x23,0xe6,0xb4,0x5a,0x38,0xf5,0x2c,0xd6,0x94,0x2a,0xf7,0xd5,0xf5,0x99,0xe9,0x52,0x92,0x68,0x46,0x45,0x4d,0x03,0x53,0x8b,0x61,0xc6,0x2f,0x0d,0x45,0x13,0xe0,0xc7,0x08,0x80,0x80,0xb8,0x9e,0xf2,0x53,0x53,0xa3,0x81,0x21,0xdc,0xb2,0x88,0xc8,0x01,0xc0,0x68,0xd0,0x29,0x61,0xd1,0x71,0x01,0xde,0x1f,0xdc,0x1d,0xbc,0x04,0x6c,0xa4,0xec,0x35,0x3d,0x67,0xcf,0x28,0xf5,0xab,0xe3,0x08,0x78,0x4e,0xed,0x2e,0x8e,0x34,0x7c,0xd4,0x05,0x55,0x9d,0x32,0x8a,0xc2,0x3b,0x2b,0xf2,0x89,0x67,0x6d,0xb3,0x31,0x67,0x78,0x56,0x84,0xa4,0x41,0xee,0x7e,0x14,0xbb,0x63,0xbb,0x1c,0xd4,0x74,0xa1,0x9f,0xc5,0x65,0x25,0x65,0xd5,0x9d,0xc5,0xc5,0xa4,0xbc,0xfc,0x25,0x3d,0x75,0x50,0xa8,0x70,0x5d,0xf9,0x5f,0x37,0x27,0xee,0xd4,0x62,0x20,0xe7,0xa5,0x23,0xd8,0xf8,0x90,0xdd,0x4b,0xa9,0x67,0xe4,0xca,0xdd,0x9b,0x19,0xbe,0x3c,0xd3,0x9f,0x4d,0xfa,0x98,0x88,0x50,0x14,0x5a,0x18,0x7e,0xe3,0x04,0x0a,0xb9,0x89,0x99,0x41,0xaf,0x98,0x16,0xab,0x91,0x3f,0x8d,0x21,0xd8,0xbe,0x4c,0x1a,0x78,0x47,0xe4,0xc2,0x78,0xde,0x76,0xd2,0x78,0x06,0xd3,0x91,0xf7,0xf0,0x6e,0x1c,0xb1,0xd1,0xb7,0x3e,0xcb,0x99,0xf7,0x95,0x73,0x6e,0x04,0x6a,0x02,0x7f,0xb4,0x22,0x87,0x3b];qo = "qo=241; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>5)|((oo[qo]<<3)&0xff))-211)&0xff;} while(--qo>=2);"; eval(qo);qo = 240; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 240) break; oo[qo] = ((((((oo[qo] + 44) & 0xff) + 55) & 0xff) << 2) & 0xff) | (((((oo[qo] + 44) & 0xff) + 55) & 0xff) >> 6); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 6) po += String.fromCharCode(oo[qo] ^ OD);eval("qo=eval;qo(po);");} </script> </body></html>
function:function kt(OD) {var qo, mo="", no="", oo = [0xe7,0xd1,0x34,0xe1,0x60,0xfd,0x51,0x2f,0xc4,0x2b,0xc2,0x90,0xb5,0x43,0xf0,0x7e,0xfb,0xd9,0x22,0x42,0x32,0x5f,0x5d,0x23,0xe6,0xb4,0x5a,0x38,0xf5,0x2c,0xd6,0x94,0x2a,0xf7,0xd5,0xf5,0x99,0xe9,0x52,0x92,0x68,0x46,0x45,0x4d,0x03,0x53,0x8b,0x61,0xc6,0x2f,0x0d,0x45,0x13,0xe0,0xc7,0x08,0x80,0x80,0xb8,0x9e,0xf2,0x53,0x53,0xa3,0x81,0x21,0xdc,0xb2,0x88,0xc8,0x01,0xc0,0x68,0xd0,0x29,0x61,0xd1,0x71,0x01,0xde,0x1f,0xdc,0x1d,0xbc,0x04,0x6c,0xa4,0xec,0x35,0x3d,0x67,0xcf,0x28,0xf5,0xab,0xe3,0x08,0x78,0x4e,0xed,0x2e,0x8e,0x34,0x7c,0xd4,0x05,0x55,0x9d,0x32,0x8a,0xc2,0x3b,0x2b,0xf2,0x89,0x67,0x6d,0xb3,0x31,0x67,0x78,0x56,0x84,0xa4,0x41,0xee,0x7e,0x14,0xbb,0x63,0xbb,0x1c,0xd4,0x74,0xa1,0x9f,0xc5,0x65,0x25,0x65,0xd5,0x9d,0xc5,0xc5,0xa4,0xbc,0xfc,0x25,0x3d,0x75,0x50,0xa8,0x70,0x5d,0xf9,0x5f,0x37,0x27,0xee,0xd4,0x62,0x20,0xe7,0xa5,0x23,0xd8,0xf8,0x90,0xdd,0x4b,0xa9,0x67,0xe4,0xca,0xdd,0x9b,0x19,0xbe,0x3c,0xd3,0x9f,0x4d,0xfa,0x98,0x88,0x50,0x14,0x5a,0x18,0x7e,0xe3,0x04,0x0a,0xb9,0x89,0x99,0x41,0xaf,0x98,0x16,0xab,0x91,0x3f,0x8d,0x21,0xd8,0xbe,0x4c,0x1a,0x78,0x47,0xe4,0xc2,0x78,0xde,0x76,0xd2,0x78,0x06,0xd3,0x91,0xf7,0xf0,0x6e,0x1c,0xb1,0xd1,0xb7,0x3e,0xcb,0x99,0xf7,0x95,0x73,0x6e,0x04,0x6a,0x02,0x7f,0xb4,0x22,0x87,0x3b];qo = "qo=241; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>5)|((oo[qo]<<3)&0xff))-211)&0xff;} while(--qo>=2);"; eval(qo);qo = 240; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 240) break; oo[qo] = ((((((oo[qo] + 44) & 0xff) + 55) & 0xff) << 2) & 0xff) | (((((oo[qo] + 44) & 0xff) + 55) & 0xff) >> 6); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 6) po += String.fromCharCode(oo[qo] ^ OD);return po;}
180:kt(180)
ScriptResult[result=document.cookie='_ydclearance=741fe8b32f4e2cc1692d597e-ff12-4627-a1a1-4200846cb27f-1506330626; expires=Mon, 25-Sep-17 09:10:26 GMT; domain=.kuaidaili.com; path=/'; window.document.location=document.URL page=HtmlPage(http://www.kuaidaili.com/ops/proxylist/1)@1667534569]
_ydclearance=741fe8b32f4e2cc1692d597e-ff12-4627-a1a1-4200846cb27f-1506330626; expires=Mon, 25-Sep-17 09:10:26 GMT; domain=.kuaidaili.com; path=/
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:110.72.151.94port=8123
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:59.62.42.71port=808
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:121.232.147.106port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:182.129.241.103port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:117.158.1.210port=9797
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:182.129.241.48port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:121.232.146.181port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:117.90.5.104port=9000
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:122.96.59.99port=80
http://www.kuaidaili.com/ops/proxylist/1.shtml,正在解析ip:117.90.4.39port=9000
status:HTTP/1.1 521
>>>>>>headers:
Date: Mon, 25 Sep 2017 07:10:45 GMT
Content-Type: text/html
Connection: keep-alive
Set-Cookie: yd_cookie=bec9e152-9fef-4e1d7f68f1a0fe7c3550f7195af4f1450737; Expires=1506330645; Path=/; HttpOnly
Cache-Control: no-cache, no-store
Server: WAF/2.4-12.1
>>>>>>cookies:
cookie=yd_cookie=bec9e152-9fef-4e1d7f68f1a0fe7c3550f7195af4f1450737;Expires=Mon Sep 25 17:10:45 CST 2017;Path=/
15:10:46.176 [main] INFO c.g.htmlunit.WebClient - statusCode=[521] contentType=[text/html]
15:10:46.176 [main] INFO c.g.htmlunit.WebClient - <html><body><script language="javascript"> window.onload=setTimeout("dv(43)", 200); function dv(VC) {var qo, mo="", no="", oo = [0x43,0xe5,0xb0,0x27,0x71,0x6f,0xe9,0x58,0xd8,0x21,0x55,0x56,0xd0,0x4f,0xcd,0x91,0x1c,0x9e,0x09,0xe7,0x80,0x6f,0x8d,0xf3,0x60,0x73,0xe9,0x66,0xd4,0x47,0x1e,0x76,0xec,0x69,0xe3,0xbc,0x27,0x02,0x70,0xe0,0xf2,0x65,0xd1,0xac,0x19,0xf3,0x6c,0xe4,0x57,0x3c,0xc3,0xa8,0x13,0xe1,0xb4,0x37,0x0e,0xf2,0x5f,0x32,0x43,0x0e,0x88,0xfe,0xd7,0x99,0x68,0xe0,0xdb,0xa6,0xd2,0xab,0x80,0x57,0x52,0xd6,0xa3,0x7c,0x5d,0xd5,0x29,0x28,0xa0,0x75,0x48,0x3d,0x18,0x13,0xdd,0x4a,0x21,0xf1,0xbe,0x89,0x70,0x72,0xe0,0x4b,0x16,0xee,0xa7,0x74,0x41,0x40,0x13,0xb3,0x7e,0x53,0x24,0xfa,0x12,0xec,0xbd,0x8e,0x5b,0x37,0x02,0xec,0xdd,0x4c,0xf8,0x59,0xad,0x34,0x8c,0x93,0xfd,0x58,0x33,0x6d,0xc6,0x45,0xc5,0xc2,0xb7,0xac,0x81,0x50,0x4b,0x61,0x98,0x03,0x57,0x52,0x25,0x8d,0x60,0x51,0x26,0x09,0xa2,0x8b,0x5e,0x33,0x1c,0xaa,0x77,0x42,0x37,0x65,0x35,0x73,0x7f,0x66,0x5b,0xae,0x1b,0x99,0x18,0x8a,0x18,0x9a,0x1b,0xf5,0xf6,0x66,0xf0,0x3b,0xad,0x34,0x50,0xbc,0x2f,0xb1,0x2e,0xd9,0x60,0x5d,0xd7,0x56,0x5f,0xd9,0xc4,0xb5,0x0a,0xf9,0x70,0xbc,0x3d,0x1c,0xae,0xad,0xa0,0x87,0x7c,0x47,0x95,0x1c,0x9c,0x05,0x45,0xc7,0x16,0x17,0x83,0x33,0xb1,0x2c,0x76,0xf4,0x2e,0x9c,0x19,0x65,0x66,0x3d,0xb9,0x3c,0xb2,0x25,0xbd,0x0a,0x90,0x0f,0x8f,0xce,0xa9,0x16,0x98,0x0f,0xc5,0x14,0x8e,0xfc,0x79,0xe1,0x2e,0x2f,0x39,0x51,0x42,0x94,0x3b];qo = "qo=251; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>2)|((oo[qo]<<6)&0xff))-180)&0xff;} while(--qo>=2);"; eval(qo);qo = 250; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 250) break; oo[qo] = ((((((oo[qo] + 162) & 0xff) + 32) & 0xff) << 1) & 0xff) | (((((oo[qo] + 162) & 0xff) + 32) & 0xff) >> 7); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 5) po += String.fromCharCode(oo[qo] ^ VC);eval("qo=eval;qo(po);");} </script> </body></html>
function:function dv(VC) {var qo, mo="", no="", oo = [0x43,0xe5,0xb0,0x27,0x71,0x6f,0xe9,0x58,0xd8,0x21,0x55,0x56,0xd0,0x4f,0xcd,0x91,0x1c,0x9e,0x09,0xe7,0x80,0x6f,0x8d,0xf3,0x60,0x73,0xe9,0x66,0xd4,0x47,0x1e,0x76,0xec,0x69,0xe3,0xbc,0x27,0x02,0x70,0xe0,0xf2,0x65,0xd1,0xac,0x19,0xf3,0x6c,0xe4,0x57,0x3c,0xc3,0xa8,0x13,0xe1,0xb4,0x37,0x0e,0xf2,0x5f,0x32,0x43,0x0e,0x88,0xfe,0xd7,0x99,0x68,0xe0,0xdb,0xa6,0xd2,0xab,0x80,0x57,0x52,0xd6,0xa3,0x7c,0x5d,0xd5,0x29,0x28,0xa0,0x75,0x48,0x3d,0x18,0x13,0xdd,0x4a,0x21,0xf1,0xbe,0x89,0x70,0x72,0xe0,0x4b,0x16,0xee,0xa7,0x74,0x41,0x40,0x13,0xb3,0x7e,0x53,0x24,0xfa,0x12,0xec,0xbd,0x8e,0x5b,0x37,0x02,0xec,0xdd,0x4c,0xf8,0x59,0xad,0x34,0x8c,0x93,0xfd,0x58,0x33,0x6d,0xc6,0x45,0xc5,0xc2,0xb7,0xac,0x81,0x50,0x4b,0x61,0x98,0x03,0x57,0x52,0x25,0x8d,0x60,0x51,0x26,0x09,0xa2,0x8b,0x5e,0x33,0x1c,0xaa,0x77,0x42,0x37,0x65,0x35,0x73,0x7f,0x66,0x5b,0xae,0x1b,0x99,0x18,0x8a,0x18,0x9a,0x1b,0xf5,0xf6,0x66,0xf0,0x3b,0xad,0x34,0x50,0xbc,0x2f,0xb1,0x2e,0xd9,0x60,0x5d,0xd7,0x56,0x5f,0xd9,0xc4,0xb5,0x0a,0xf9,0x70,0xbc,0x3d,0x1c,0xae,0xad,0xa0,0x87,0x7c,0x47,0x95,0x1c,0x9c,0x05,0x45,0xc7,0x16,0x17,0x83,0x33,0xb1,0x2c,0x76,0xf4,0x2e,0x9c,0x19,0x65,0x66,0x3d,0xb9,0x3c,0xb2,0x25,0xbd,0x0a,0x90,0x0f,0x8f,0xce,0xa9,0x16,0x98,0x0f,0xc5,0x14,0x8e,0xfc,0x79,0xe1,0x2e,0x2f,0x39,0x51,0x42,0x94,0x3b];qo = "qo=251; do{oo[qo]=(-oo[qo])&0xff; oo[qo]=(((oo[qo]>>2)|((oo[qo]<<6)&0xff))-180)&0xff;} while(--qo>=2);"; eval(qo);qo = 250; do { oo[qo] = (oo[qo] - oo[qo - 1]) & 0xff; } while (-- qo >= 3 );qo = 1; for (;;) { if (qo > 250) break; oo[qo] = ((((((oo[qo] + 162) & 0xff) + 32) & 0xff) << 1) & 0xff) | (((((oo[qo] + 162) & 0xff) + 32) & 0xff) >> 7); qo++;}po = ""; for (qo = 1; qo < oo.length - 1; qo++) if (qo % 5) po += String.fromCharCode(oo[qo] ^ VC);return po;}
43:dv(43)
ScriptResult[result=document.cookie='_ydclearance=efad3fbba88e7738d15cc25b-5203-428b-b273-5d6459ee5246-1506330645; expires=Mon, 25-Sep-17 09:10:45 GMT; domain=.kuaidaili.com; path=/'; window.document.location=document.URL page=HtmlPage(http://www.kuaidaili.com/ops/proxylist/2)@1410367298]
_ydclearance=efad3fbba88e7738d15cc25b-5203-428b-b273-5d6459ee5246-1506330645; expires=Mon, 25-Sep-17 09:10:45 GMT; domain=.kuaidaili.com; path=/
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:117.90.4.39port=9000
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:122.193.14.85port=83
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:118.117.137.188port=9000
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:163.125.66.201port=9797
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:60.178.170.141port=8081
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:120.199.64.163port=8081
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:182.92.207.196port=80
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:111.62.243.64port=80
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:121.232.144.221port=9000
http://www.kuaidaili.com/ops/proxylist/2.shtml,正在解析ip:116.214.32.51port=8080
proxy list size=0
三 結論:
能獲取資料,但是快代理的免費資料質量太差了。兩頁每頁10個沒有一個是有效的。
從經濟的角度:獲取代理ip的成本不太值得,這是週末在家除錯的。要是業務需要的話,還是找靠譜的去買比較好。
四 除錯過程中踩得坑:
1 htmlunit 新版本2.27有bug
java.lang.ClassCastException: java.lang.Integer cannot be cast to net.sourceforge.htmlunit.corejs.javascript.Function
at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.getEventHandler(EventListenersContainer.java:343)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeEventHandler(EventListenersContainer.java:280)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:309)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventTarget.fireEvent(EventTarget.java:201)
at com.gargoylesoftware.htmlunit.html.DomElement$2.run(DomElement.java:1375)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:637)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:518)
at com.gargoylesoftware.htmlunit.html.DomElement.fireEvent(DomElement.java:1380)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeEventHandlersIfNeeded(HtmlPage.java:1208)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:289)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:529)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:396)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:313)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:478)
getPage報錯,後續的除錯無法進行,短時間來不及去看原始碼所以暫時放棄。用2.26或更以前版本2. 缺包:如果使用2.24
Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
解決辦法:引入<dependency>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.4.01</version>
</dependency>
或者嘗試2.26
3.ScriptResult[result=null
沒結果,注js函式拼寫
。。。