開源網站流量統計系統Piwik原始碼分析——引數統計(一)
Piwik現已改名為,這是一套國外著名的開源網站統計系統,類似於百度統計、Google Analytics等系統。最大的區別就是可以看到其中的原始碼,這正合我意。因為我一直對統計的系統很好奇,很想知道里面的執行原理是怎麼樣的,碰巧了解到有這麼一個系統,因此馬上嘗試了一下。國內關於該系統的相關資料比較匱乏,大多是分享怎麼安裝的,並沒有找到有關原始碼分析的文章。下面先對其做個初步的分析,後面會越來越詳細,本人目前的職位是前端,因此會先分析指令碼程式碼,而後再分析後臺程式碼。
一、整體概況
Piwik的官網是,使用PHP編寫的,而我以前就是PHP工程師,因此看程式碼不會有障礙。目前最新版本是3.6,Github地址是,開啟地址將會看到下圖中的內容(只截取了關鍵部分)。
開啟js資料夾,裡面的piwik.js就是本次要分析的指令碼程式碼(如下圖紅色框出部分),內容比較多,有7838行程式碼。
先把系統的程式碼都下載下來,然後在本地配置虛擬目錄,再開始安裝。在安裝的時候可以選擇語言,該系統支援簡體中文(注意下圖中紅色框出的部分)。系統會執行一些操作(注意看下圖左邊部分),包括檢查當前環境能否安裝、建立資料庫等,按照提示一步一步來就行,比較簡單,沒啥難度。
安裝完後就會自動跳轉到後臺介面(如下圖所示),有圖表,有分析,和常用的統計系統差不多。功能還沒細看,只做了初步的瞭解,介面的友好度還是蠻不錯的。
嵌到頁面中的JavaScript程式碼與其它統計系統也類似,如下所示,也是用非同步載入的方式,只是傳送的請求地址沒有偽裝成影象地址(注意看標紅的那句程式碼)。
<script type="text/javascript"> var _paq = _paq || []; /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); (function() { var u="//loc.piwik.cn/"; //自定義_paq.push(['setTrackerUrl', u+'piwik.php']); _paq.push(['setSiteId', '1']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.type='text/javascript'; g.async=true; g.defer=true; g.src='piwik.js'; s.parentNode.insertBefore(g,s); })(); </script>
在頁面中嵌入這段指令碼後,頁面在重新整理的時候,會有下圖中的請求。在請求中帶了一大堆的引數,在後面的內容中會對每個引數做釋義。
二、指令碼拆分
7000多行的指令碼,當然不能一行一行的讀,需要先拆分,拆成一個一個的模組,然後再逐個分析。指令碼之所以這麼大,是因為裡面編寫了大量程式碼來相容各個版本的瀏覽器,這其中甚至包括IE4、Firefox1.0、Netscape等骨灰級的瀏覽器。接下來我把原始碼拆分成6個部分,分別是json、private、query、content-overlay、tracker和piwik,如下圖紅線框出的所示,piwik-all中包含了全部程式碼,便於對比。程式碼已上傳到。
json.js是一個開源外掛JSON3,為了相容不支援JSON物件的瀏覽器而設計的,這裡面的程式碼可以單獨研究。private.js包含了一些用於全域性的私有變數和私有函式,例如定義系統物件的別名、判斷型別等。query.js中包含了很多操作HTML元素的方法,例如設定元素屬性、查詢某個CSS類的元素等,它類似於一個微型的jQuery庫,不過有許多獨特的功能。content-overlay.js有兩部分組成,一部分包含內容追蹤以及URL拼接等功能,另一部分是用來處理巢狀的頁面,這裡面具體沒有細看。tracker.js中只有一個Tracker()函式,不過內容最多,有4700多行,主要的統計邏輯都在這裡了。piwik.js中內容不多,包含一些初始化和外掛的鉤子等功能,鉤子具體怎麼運作的還沒細看。
雖然分成了6部分,但是各部分的內容還是蠻多的,並且內容之間是有聯絡的,因此短時間的話,很難搞清楚其中所有的門道。我就挑了一點我個人感覺最重要的先做分析。
1)3種傳送資料的方式
我原先只知道兩種傳送資料的方式,一種是通過Ajax的方式,另一種是建立一個Image物件,然後為其定義src屬性,資料作為URL的引數傳遞給後臺,這種方式很通用,並且還能完美解決跨域問題。我以前編寫的一個性能引數蒐集的外掛,也是這麼傳送資料的。在閱讀原始碼的時候,發現了第三種傳送資料的方式,使用Navigator物件的。
MDN上說:“此方法可用於通過HTTP將少量資料非同步傳輸到Web伺服器”。雖然這個方法有相容問題,但我還是被震撼到了。它很適合統計的場景,MDN上又講到:“統計程式碼會在頁面關閉()之前向web伺服器傳送資料,但過早的傳送資料可能錯過收集資料的機會。然而, 要保證在頁面關閉期間傳送資料一直比較困難,因為瀏覽器通常會忽略在解除安裝事件中產生的非同步請求 。在使用sendBeacon()方法後,能使瀏覽器在有機會時非同步地向伺服器傳送資料,同時不會延遲頁面的解除安裝或影響下一頁的載入。這就解決了提交分析資料時的所有的問題:使它可靠,非同步並且不會影響下一頁面的載入,並且程式碼更簡單”。下面是程式碼片段(注意看標紅的那句程式碼),存在於tracker.js中。
function sendPostRequestViaSendBeacon(request) { var supportsSendBeacon = "object" === typeof navigatorAlias && "function" === typeof navigatorAlias.sendBeacon && "function" === typeof Blob; if (!supportsSendBeacon) { return false; } var headers = { type: "application/x-www-form-urlencoded; charset=UTF-8" }; var success = false; try { var blob = new Blob([request], headers); success = navigatorAlias.sendBeacon(configTrackerUrl, blob); // returns true if the user agent is able to successfully queue the data for transfer, // Otherwise it returns false and we need to try the regular way } catch (e) { return false; } return success; }
2)引數釋義
下面的方法(存在於tracker.js中)專門用於蒐集頁面中的統計資料,將它們拼接成指定連結的引數,而這條連結中的引數最終將會發送給伺服器。
/** * Returns the URL to call piwik.php, * with the standard parameters (plugins, resolution, url, referrer, etc.). * Sends the pageview and browser settings with every request in case of race conditions. */ function getRequest(request, customData, pluginMethod, currentEcommerceOrderTs) { var i, now = new Date(), nowTs = Math.round(now.getTime() / 1000), referralTs, referralUrl, referralUrlMaxLength = 1024, currentReferrerHostName, originalReferrerHostName, customVariablesCopy = customVariables, cookieSessionName = getCookieName("ses"), cookieReferrerName = getCookieName("ref"), cookieCustomVariablesName = getCookieName("cvar"), cookieSessionValue = getCookie(cookieSessionName), attributionCookie = loadReferrerAttributionCookie(), currentUrl = configCustomUrl || locationHrefAlias, campaignNameDetected, campaignKeywordDetected; if (configCookiesDisabled) { deleteCookies(); } if (configDoNotTrack) { return ""; } var cookieVisitorIdValues = getValuesFromVisitorIdCookie(); if (!isDefined(currentEcommerceOrderTs)) { currentEcommerceOrderTs = ""; } // send charset if document charset is not utf-8. sometimes encoding // of urls will be the same as this and not utf-8, which will cause problems // do not send charset if it is utf8 since it's assumed by default in Piwik var charSet = documentAlias.characterSet || documentAlias.charset; if (!charSet || charSet.toLowerCase() === "utf-8") { charSet = null; } campaignNameDetected = attributionCookie[0]; campaignKeywordDetected = attributionCookie[1]; referralTs = attributionCookie[2]; referralUrl = attributionCookie[3]; if (!cookieSessionValue) { // cookie 'ses' was not found: we consider this the start of a 'session' // here we make sure that if 'ses' cookie is deleted few times within the visit // and so this code path is triggered many times for one visit, // we only increase visitCount once per Visit window (default 30min) var visitDuration = configSessionCookieTimeout / 1000; if ( !cookieVisitorIdValues.lastVisitTs || nowTs - cookieVisitorIdValues.lastVisitTs > visitDuration ) { cookieVisitorIdValues.visitCount++; cookieVisitorIdValues.lastVisitTs = cookieVisitorIdValues.currentVisitTs; } // Detect the campaign information from the current URL // Only if campaign wasn't previously set // Or if it was set but we must attribute to the most recent one // Note: we are working on the currentUrl before purify() since we can parse the campaign parameters in the hash tag if ( !configConversionAttributionFirstReferrer || !campaignNameDetected.length ) { for (i in configCampaignNameParameters) { if ( Object.prototype.hasOwnProperty.call(configCampaignNameParameters, i) ) { campaignNameDetected = getUrlParameter( currentUrl, configCampaignNameParameters[i] ); if (campaignNameDetected.length) { break; } } } for (i in configCampaignKeywordParameters) { if ( Object.prototype.hasOwnProperty.call( configCampaignKeywordParameters, i ) ) { campaignKeywordDetected = getUrlParameter( currentUrl, configCampaignKeywordParameters[i] ); if (campaignKeywordDetected.length) { break; } } } } // Store the referrer URL and time in the cookie; // referral URL depends on the first or last referrer attribution currentReferrerHostName = getHostName(configReferrerUrl); originalReferrerHostName = referralUrl.length ? getHostName(referralUrl) : ""; if ( currentReferrerHostName.length && // there is a referrer !isSiteHostName(currentReferrerHostName) && // domain is not the current domain (!configConversionAttributionFirstReferrer || // attribute to last known referrer !originalReferrerHostName.length || // previously empty isSiteHostName(originalReferrerHostName)) ) { // previously set but in current domain referralUrl = configReferrerUrl; } // Set the referral cookie if we have either a Referrer URL, or detected a Campaign (or both) if (referralUrl.length || campaignNameDetected.length) { referralTs = nowTs; attributionCookie = [ campaignNameDetected, campaignKeywordDetected, referralTs, purify(referralUrl.slice(0, referralUrlMaxLength)) ]; setCookie( cookieReferrerName, JSON_PIWIK.stringify(attributionCookie), configReferralCookieTimeout, configCookiePath, configCookieDomain ); } } // build out the rest of the request request += "&idsite=" + configTrackerSiteId + "&rec=1" + "&r=" + String(Math.random()).slice(2, 8) + // keep the string to a minimum "&h=" + now.getHours() + "&m=" + now.getMinutes() + "&s=" + now.getSeconds() + "&url=" + encodeWrapper(purify(currentUrl)) + (configReferrerUrl.length ? "&urlref=" + encodeWrapper(purify(configReferrerUrl)) : "") + (configUserId && configUserId.length ? "&uid=" + encodeWrapper(configUserId) : "") + "&_id=" + cookieVisitorIdValues.uuid + "&_idts=" + cookieVisitorIdValues.createTs + "&_idvc=" + cookieVisitorIdValues.visitCount + "&_idn=" + cookieVisitorIdValues.newVisitor + // currently unused (campaignNameDetected.length ? "&_rcn=" + encodeWrapper(campaignNameDetected) : "") + (campaignKeywordDetected.length ? "&_rck=" + encodeWrapper(campaignKeywordDetected) : "") + "&_refts=" + referralTs + "&_viewts=" + cookieVisitorIdValues.lastVisitTs + (String(cookieVisitorIdValues.lastEcommerceOrderTs).length ? "&_ects=" + cookieVisitorIdValues.lastEcommerceOrderTs : "") + (String(referralUrl).length ? "&_ref=" + encodeWrapper(purify(referralUrl.slice(0, referralUrlMaxLength))) : "") + (charSet ? "&cs=" + encodeWrapper(charSet) : "") + "&send_image=0"; // browser features for (i in browserFeatures) { if (Object.prototype.hasOwnProperty.call(browserFeatures, i)) { request += "&" + i + "=" + browserFeatures[i]; } } var customDimensionIdsAlreadyHandled = []; if (customData) { for (i in customData) { if ( Object.prototype.hasOwnProperty.call(customData, i) && /^dimension\d+$/.test(i) ) { var index = i.replace("dimension", ""); customDimensionIdsAlreadyHandled.push(parseInt(index, 10)); customDimensionIdsAlreadyHandled.push(String(index)); request += "&" + i + "=" + customData[i]; delete customData[i]; } } } if (customData && isObjectEmpty(customData)) { customData = null; // we deleted all keys from custom data } // custom dimensions for (i in customDimensions) { if (Object.prototype.hasOwnProperty.call(customDimensions, i)) { var isNotSetYet = -1 === indexOfArray(customDimensionIdsAlreadyHandled, i); if (isNotSetYet) { request += "&dimension" + i + "=" + customDimensions[i]; } } } // custom data if (customData) { request += "&data=" + encodeWrapper(JSON_PIWIK.stringify(customData)); } else if (configCustomData) { request += "&data=" + encodeWrapper(JSON_PIWIK.stringify(configCustomData)); } // Custom Variables, scope "page" function appendCustomVariablesToRequest(customVariables, parameterName) { var customVariablesStringified = JSON_PIWIK.stringify(customVariables); if (customVariablesStringified.length > 2) { return ( "&" + parameterName + "=" + encodeWrapper(customVariablesStringified) ); } return ""; } var sortedCustomVarPage = sortObjectByKeys(customVariablesPage); var sortedCustomVarEvent = sortObjectByKeys(customVariablesEvent); request += appendCustomVariablesToRequest(sortedCustomVarPage, "cvar"); request += appendCustomVariablesToRequest(sortedCustomVarEvent, "e_cvar"); // Custom Variables, scope "visit" if (customVariables) { request += appendCustomVariablesToRequest(customVariables, "_cvar"); // Don't save deleted custom variables in the cookie for (i in customVariablesCopy) { if (Object.prototype.hasOwnProperty.call(customVariablesCopy, i)) { if (customVariables[i][0] === "" || customVariables[i][1] === "") { delete customVariables[i]; } } } if (configStoreCustomVariablesInCookie) { setCookie( cookieCustomVariablesName, JSON_PIWIK.stringify(customVariables), configSessionCookieTimeout, configCookiePath, configCookieDomain ); } } // performance tracking if (configPerformanceTrackingEnabled) { if (configPerformanceGenerationTime) { request += ">_ms=" + configPerformanceGenerationTime; } else if ( performanceAlias && performanceAlias.timing && performanceAlias.timing.requestStart && performanceAlias.timing.responseEnd ) { request += ">_ms=" + (performanceAlias.timing.responseEnd - performanceAlias.timing.requestStart); } } if (configIdPageView) { request += "&pv_id=" + configIdPageView; } // update cookies cookieVisitorIdValues.lastEcommerceOrderTs = isDefined(currentEcommerceOrderTs) && String(currentEcommerceOrderTs).length ? currentEcommerceOrderTs : cookieVisitorIdValues.lastEcommerceOrderTs; setVisitorIdCookie(cookieVisitorIdValues); setSessionCookie(); // tracker plugin hook request += executePluginMethod(pluginMethod, { tracker: trackerInstance, request: request }); if (configAppendToTrackingUrl.length) { request += "&" + configAppendToTrackingUrl; } if (isFunction(configCustomRequestContentProcessing)) { request = configCustomRequestContentProcessing(request); } return request; }
統計程式碼每次都會傳送資料,而每次請求都會帶上一大串的引數,這些引數都是簡寫,下面做個簡單說明(如有不正確的地方,歡迎指正),部分引數還沒作出合適的解釋,例如UUID的生成規則等。首先將這些引數分為兩部分,第一部分如下所列:
1、idsite:網站ID
2、rec:1(寫死)
3、r:隨機碼
4、h:當前小時
5、m:當前分鐘
6、s:當前秒數
7、url:當前純淨地址,只留域名和協議
8、_id:UUID
9、_idts:訪問的時間戳
10、_idvc:訪問數
11、_idn:新訪客(目前尚未使用)
12、_refts:訪問來源的時間戳
13、_viewts:上一次訪問的時間戳
14、cs:當前頁面的字元編碼
15、send_image:是否用影象請求方式傳輸資料
16、gt_ms:內容載入消耗的時間(響應結束時間減去請求開始時間)
17、pv_id:唯一性標識
再列出第二部分,用於統計瀏覽器的功能,通過Navigator物件的屬性(mimeTypes、javaEnabled等)和Screen物件的屬性(width與height)獲得。
1、pdf:是否支援pdf檔案型別
2、qt:是否支援QuickTime Player播放器
3、realp:是否支援RealPlayer播放器
4、wma:是否支援MPlayer播放器
5、dir:是否支援Macromedia Director
6、fla:是否支援Adobe FlashPlayer
7、java:是否激活了Java
8、gears:是否安裝了Google Gears
9、ag:是否安裝了Microsoft Silverlight
10、cookie:是否啟用了Cookie
11、res:螢幕的寬和高(未正確計算高清顯示器)
上面這11個引數的獲取程式碼,可以參考下面這個方法(同樣存在於tracker.js中),注意看程式碼中的pluginMap變數(已標紅),它儲存了多個MIME型別,用來檢測是否安裝或啟用了指定的外掛或功能。
/* * Browser features (plugins, resolution, cookies) */ function detectBrowserFeatures() { var i, mimeType, pluginMap = { // document types pdf: "application/pdf", // media players qt: "video/quicktime", realp: "audio/x-pn-realaudio-plugin", wma: "application/x-mplayer2", // interactive multimedia dir: "application/x-director", fla: "application/x-shockwave-flash", // RIA java: "application/x-java-vm", gears: "application/x-googlegears", ag: "application/x-silverlight" }; // detect browser features except IE < 11 (IE 11 user agent is no longer MSIE) if (!new RegExp("MSIE").test(navigatorAlias.userAgent)) { // general plugin detection if (navigatorAlias.mimeTypes && navigatorAlias.mimeTypes.length) { for (i in pluginMap) { if (Object.prototype.hasOwnProperty.call(pluginMap, i)) { mimeType = navigatorAlias.mimeTypes[pluginMap[i]]; browserFeatures[i] = mimeType && mimeType.enabledPlugin ? "1" : "0"; } } } // Safari and Opera // IE6/IE7 navigator.javaEnabled can't be aliased, so test directly // on Edge navigator.javaEnabled() always returns `true`, so ignore it if ( !new RegExp("Edge[ /](\\d+[\\.\\d]+)").test(navigatorAlias.userAgent) && typeof navigator.javaEnabled !== "unknown" && isDefined(navigatorAlias.javaEnabled) && navigatorAlias.javaEnabled() ) { browserFeatures.java = "1"; } // Firefox if (isFunction(windowAlias.GearsFactory)) { browserFeatures.gears = "1"; } // other browser features browserFeatures.cookie = hasCookies(); } var width = parseInt(screenAlias.width, 10); var height = parseInt(screenAlias.height, 10); browserFeatures.res = parseInt(width, 10) + "x" + parseInt(height, 10); }
除了上述20多個引數之外,在系統官網上可點選“”檢視到所有的引數,只不過都是英文的。