1. 程式人生 > >scala實戰之spark使用者線上時長和登入次數統計例項

scala實戰之spark使用者線上時長和登入次數統計例項

接觸spark後就開始學習scala語言了,因為有一點python和java的基礎學習起來還行,今天在這裡把我工作中應用scala程式設計統計分析使用者行為日誌的例項和大家分析一下,我這裡主要講一下使用者的線上時長統計和登入次數統計演算法實現過程。

第一步 程式設計環境:首先你得有spark安裝包 你可以先不用本地安裝spark,但是可以通過import spark-assembly-1.6.2-hadoop2.6.0.jar包來完成程式除錯 另外需要scala的執行環境,我用的版本:scala-sdk-2.10.6

第二步 就是處理的原材料 系統日誌 我這裡貼出部分我處理的日誌吧

2016-04-18 16:00:00 {"areacode":"浙江省麗水市","countAll":0,"countCorrect":0,"datatime":"4134362","logid":"201604181600001184409476","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966390499\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"13989589062\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"13989589062\"}","requestip":"36.16.128.234","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"寧夏銀川市","countAll":0,"countCorrect":0,"datatime":"4715990","logid":"201604181600001858043208","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400120\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1210\",\"imei\":\"A0000044ABFD25\",\"subjectNum\":\"15379681917\",\"imsi\":\"460036951451601\",\"queryNum\":\"\"}","requestip":"115.168.93.87","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果","userAgent":"ZTE-Me/Mobile"}
2016-04-18 16:00:00 {"areacode":"黑龍江省哈爾濱市","countAll":0,"countCorrect":0,"datatime":"5369561","logid":"201604181600001068429609","requestinfo":"{\"interfaceUserName\":\"12345678900987654321\",\"queryNum\":\"\",\"timestamp\":\"1460966400139\",\"sign\":\"4\",\"imsi\":\"460030301212545\",\"imei\":\"35460207765269\",\"subjectNum\":\"55588237\",\"subjectPro\":\"123456\",\"remark\":\"4\",\"channelno\":\"2100\"}","requestip":"42.184.41.180","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"浙江省麗水市","countAll":0,"countCorrect":0,"datatime":"4003096","logid":"201604181600001648238807","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966391025\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"13989589062\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"13989589062\"}","requestip":"36.16.128.234","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"廣西南寧市","countAll":0,"countCorrect":0,"datatime":"4047993","logid":"201604181600001570024205","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966382871\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"A000004853168C\",\"subjectNum\":\"07765232589\",\"imsi\":\"460031210400007\",\"queryNum\":\"13317810717\"}","requestip":"219.159.72.3","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"海南省五指山市","countAll":0,"countCorrect":0,"datatime":"5164117","logid":"201604181600001227842048","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399159\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1017\",\"imei\":\"A000005543AFB7\",\"subjectNum\":\"089836329061\",\"imsi\":\"460036380954376\",\"queryNum\":\"13389875751\"}","requestip":"140.240.171.71","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"山西省","countAll":0,"countCorrect":0,"datatime":"14075772","logid":"201604181600001284030648","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400332\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"A000004FE0218A\",\"subjectNum\":\"03514043633\",\"imsi\":\"460037471517070\",\"queryNum\":\"\"}","requestip":"1.68.5.227","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"四川省","countAll":0,"countCorrect":0,"datatime":"6270982","logid":"201604181600001173504863","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966398896\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"13666231300\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"13666231300\"}","requestip":"182.144.66.97","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"4198522","logid":"201604181600001390637240","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399464\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"05533876327\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"05533876327\"}","requestip":"36.23.9.49","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"000000","responsedata":"操作成功"}
2016-04-18 16:00:00 {"areacode":"江蘇省連雲港市","countAll":0,"countCorrect":0,"datatime":"4408097","logid":"201604181600001249944032","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966395908\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"18361451463\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"18361451463\"}","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"5154518","logid":"201604181600001714496463","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399474\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"05533876327\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"05533876327\"}","requestip":"36.23.9.49","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"000000","responsedata":"操作成功"}
2016-04-18 16:00:00 {"areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"4761269","logid":"201604181600001187577136","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400191\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"057427895481\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"057427895481\"}","requestip":"36.23.153.219","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"河北省廊坊市","countAll":0,"countCorrect":0,"datatime":"75408665","logid":"201604181600001020722122","requestinfo":"{\"subjectNum\":\"13582968216\",\"imsi\":\"460031298611058\",\"queryNum\":\"18033684000\",\"channelno\":\"100\",\"imei\":\"99000586096233\"}","requestip":"110.251.61.62","requesttime":"2016-04-18 16:00:00","requesttype":"28","responsecode":"010005","responsedata":"查詢結果為空"}
2016-04-18 16:00:00 {"areacode":"貴州省黔西南州興義市","countAll":0,"countCorrect":0,"datatime":"4586950","logid":"201604181600001499837763","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966398600\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"865707029710377\",\"subjectNum\":\"509\",\"imsi\":\"460025864693571\",\"queryNum\":\"\"}","requestip":"111.85.45.172","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"雲南省昆明市","countAll":0,"countCorrect":0,"datatime":"4441961","logid":"201604181600001794147521","requestinfo":"{\"interfaceUserName\":\"12345678900987654321\",\"queryNum\":\"13618922555\",\"timestamp\":\"1460966401214\",\"sign\":\"4\",\"imsi\":\"12345678900987654321\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"13618922555\",\"subjectPro\":\"123456\",\"remark\":\"4\",\"channelno\":\"100\"}","requestip":"113.63.132.128","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"江蘇省連雲港市","countAll":0,"countCorrect":0,"datatime":"4186305","logid":"201604181600001175993827","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966397309\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"18361451463\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"18361451463\"}","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"江蘇省","countAll":0,"countCorrect":0,"datatime":"4103662","logid":"201604181600001051944754","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399642\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"a0000059788b71\",\"subjectNum\":\"768\",\"imsi\":\"460036660539168\",\"queryNum\":\"\"}","requestip":"180.98.180.95","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"山西省","countAll":0,"countCorrect":0,"datatime":"4247256","logid":"201604181600001013319164","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400334\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"A000004FE0218A\",\"subjectNum\":\"03514043633\",\"imsi\":\"460037471517070\",\"queryNum\":\"\"}","requestip":"1.68.5.227","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"北京市","countAll":0,"countCorrect":0,"datatime":"5401532","logid":"201604181600001469644300","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399603\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"4001004259\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"\"}","requestip":"106.121.0.143","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"北京市","countAll":0,"countCorrect":0,"datatime":"4876709","logid":"201604181600001476349766","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399603\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"4001004259\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"\"}","requestip":"106.121.0.143","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"江蘇省連雲港市","countAll":0,"countCorrect":0,"datatime":"4498474","logid":"201604181600001508125886","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966397987\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"18361451463\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"18361451463\"}","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"江蘇省連雲港市","countAll":0,"countCorrect":0,"datatime":"4318254","logid":"201604181600001766447939","requestinfo":"{\"subjectNum\":\"66699\",\"imsi\":\"460036611592505\",\"queryNum\":\"\",\"channelno\":\"100\",\"imei\":\"A00000457ECC28\"}","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"28","responsecode":"000000","responsedata":"操作成功"}
2016-04-18 16:00:00 {"areacode":"江西省南昌市","countAll":0,"countCorrect":0,"datatime":"244260927","logid":"201604181559591112708085","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400525\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"a000004f883c2e\",\"subjectNum\":\"813161\",\"imsi\":\"460031392055476\",\"queryNum\":\"\"}","requestip":"182.97.149.145","requesttime":"2016-04-18 15:59:59","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果","userAgent":"Dalvik/1.6.0 (Linux; U; Android 4.4.2; HUAWEI P7-L09 Build/HuaweiP7-L09)"}
2016-04-18 16:00:00 {"areacode":"上海市黃浦區","countAll":0,"countCorrect":0,"datatime":"4657170","logid":"201604181600001303952983","requestinfo":"{\"interfaceUserName\":\"12345678900987654321\",\"queryNum\":\"\",\"timestamp\":\"1460966400444\",\"sign\":\"4\",\"imei\":\"a000005901fef3\",\"subjectNum\":\"4235\",\"subjectPro\":\"123456\",\"remark\":\"4\",\"channelno\":\"9000\"}","requestip":"124.74.160.162","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果","userAgent":"Dalvik/2.1.0 (Linux; U; Android 6.0; HUAWEI CRR-CL00 Build/HUAWEICRR-CL00)"}
2016-04-18 16:00:00 {"areacode":"江西省南昌市","countAll":0,"countCorrect":0,"datatime":"252676235","logid":"201604181559591152287931","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400399\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"a000004f883c2e\",\"subjectNum\":\"813161\",\"imsi\":\"460031392055476\",\"queryNum\":\"\"}","requestip":"182.97.149.145","requesttime":"2016-04-18 15:59:59","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果","userAgent":"Dalvik/1.6.0 (Linux; U; Android 4.4.2; HUAWEI P7-L09 Build/HuaweiP7-L09)"}
2016-04-18 16:00:00 {"areacode":"區域網","countAll":0,"countCorrect":0,"datatime":"5160006","logid":"201604181600001026793341","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399352\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1002\",\"imei\":\"A00000457ECC28\",\"subjectNum\":\"66699\",\"imsi\":\"460036611592505\",\"queryNum\":\"\"}","requestip":"10.55.80.187","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"江蘇省","countAll":0,"countCorrect":0,"datatime":"245262271","logid":"201604181559591753547387","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399846\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"A000004F661365\",\"subjectNum\":\"2336\",\"imsi\":\"460036580978572\",\"queryNum\":\"\"}","requestip":"180.98.187.27","requesttime":"2016-04-18 15:59:59","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果","userAgent":"Dalvik/1.6.0 (Linux; U; Android 4.4.2; HUAWEI C199 Build/HuaweiC199)"}
2016-04-18 16:00:00 {"countAll":0,"countCorrect":0,"logid":"201604181600001605286233","requestip":"36.23.153.219","requesttime":"2016-04-18 16:00:00","requesttype":"0"}
2016-04-18 16:00:00 {"areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"4203930","logid":"201604181600001873855360","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400191\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"057427895481\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"057427895481\"}","requestip":"36.23.153.219","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"無查詢結果"}
2016-04-18 16:00:00 {"areacode":"河南省鄭州市","countAll":0,"countCorrect":0,"datatime":"338870020","logid":"201604181559591841947051","requestinfo":"{\"subjectNum\":\"621418\",\"imsi\":\"460037561702775\",\"queryNum\":\"\",\"channelno\":\"100\",\"imei\":\"a0000055dc82e3\"}","requestip":"106.33.148.44","requesttime":"2016-04-18 15:59:59","requesttype":"28","responsecode":"000000","responsedata":"操作成功","userAgent":"Dalvik/1.6.0 (Linux; U; Android 4.4.2; PE-CL00 Build/HuaweiPE-CL00)"}

第三 說一下演算法實現的原理:首先我們定了一個原則 就是使用者如果將我們的app退到後臺10分鐘 或者 10分鐘沒有其他操作 視使用者已經退出,如果10分鐘後再次發現使用者操作日誌記錄我們將其視第二次登入app。在這個原則的基礎上,我們演算法實現 首先是 將使用者的行為日誌load到RDD中,在load的過程中,對每行記錄進行過濾去掉type不是我們想要的,imei不合法的或者nuknown的,拿到log中含有imei和logid(這裡指的就是使用者的操作時間)的記錄。得到RDD後我們會得到兩組資料 第一列是imei 第二列是使用者的操作時間,我們首先按imei號groupbykey()然後對每個key的logid進行list排序。後續我們就按照上面的原則來求取每個使用者的登入詳情和當天總的登入次數和線上時長情況。

廢話不多說上程式碼:

/**
  * Created by zhoubh on 2016/6/28.
  */
import java.text.SimpleDateFormat

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

import scala.util.matching.Regex

/**
  * 使用者線上時長和登入次數統計
  */
object UserOnlineAnalysis {
  def main(args: Array[String]) {
    if (args.length != 2) {
      System.err.println("Usage: UserOnlineAnalysis <input> <output>")
      System.exit(1)
    }

    val conf = new SparkConf().setAppName("UserOnlineAnalysis").setMaster("local[4]")
    val sc = new SparkContext(conf)
    //args(0)輸入檔案路徑
    val data = sc.textFile(args(0))
    //剔除type等於3的資料 imei為Unknown 為"" 為"000000000000000"的資料
    val notContainsType3 = data.filter(!_.contains("\\\"type\\\":\\\"3\\\"")).filter(!_.contains("\\\"imei\\\":\\\"\\\"")).filter(!_.contains("000000000000000")).filter(!_.contains("Unknown"))
    //過濾logid或imei不存在的資料 \"imei\":\"\"
    val cleanData = notContainsType3.filter(_.contains("logid")).filter(_.contains("imei"))

    val cleanMap = cleanData.map {
      line =>
          val data = formatLine(line).split(",")
        (data(0), data(1))
    }
    //RDD的資料安裝IMEI號分組並且按照imei號排序,輸出時每行分組的第二個元素列表按照時間排序sortByKey().
    val rdd = cleanMap.groupByKey().map(x => (x._1, x._2.toList.sorted))

    rdd.cache()

    //匯出明細
    exportDetailData(rdd, args(1) + "/detail")

    //匯出統計
    exportSumData(rdd, args(1) + "/sum")


    rdd.unpersist()

    sc.stop()

  }

  /**
    * 匯出使用者線上時長和登入次數統計結果
    * 儲存結構:(IMEI,登入次數,線上時長(秒))
    *
    **/
  def exportSumData(map: RDD[(String, List[String])], output: String): Unit = {
    val result = map.map {
      x =>
        //登入次數,預設登入1次
        var logNum: Int = 1
        //線上時長(秒)
        var totalTime: Long = 0

        val len = x._2.length

        for (i <- 0 until len) {
          if (i + 1 < len) {
            val nowTime = getTimeByString(x._2(i))
            val nextTime = getTimeByString(x._2(i + 1))
            val intervalTime = nextTime - nowTime
            if (intervalTime < 60 * 10) {
              totalTime += intervalTime
            } else {
              logNum += 1
            }
          }

        }
        //輸出ime,登入次數,總時長(秒)
        (x._1, logNum, totalTime)
    }

    result.saveAsTextFile(output)
  }

  /**
    * 匯出使用者線上時長和首次登入時間
    * 儲存結構:(IMEI,首次登入時間,線上時長(秒))
    *
    **/
  def exportDetailData(map: RDD[(String, List[String])], output: String): Unit = {
    val result = map.flatMap {
      x =>
        val len = x._2.length
        val array = new Array[(String, String, Long)](len)
        for (i <- 0 until len) {
          if (i + 1 < len) {
            val nowTime = getTimeByString(x._2(i))
            val nextTime = getTimeByString(x._2(i + 1))
            val intervalTime = nextTime - nowTime
            if (intervalTime < 60 * 10) {
              array(i) = (x._1, x._2(i), intervalTime)
            } else {
              array(i) = (x._1, x._2(i), 0)
            }
          } else {
            array(i) = (x._1, x._2(i), 0)
          }

        }
        array
    }
    result.saveAsTextFile(output)
  }

  /**
    * 從每行日誌解析出imei和logid
    *
    **/
  def formatLine(line: String): String = {
      val logIdRegex = """"logid":"([0-9]+)",""".r
    val imeiRegex = """\\"imei\\":\\"([A-Za-z0-9]+)\\"""".r
    val logId = getDataByPattern(logIdRegex, line)
    val imei = getDataByPattern(imeiRegex, line)

    //時間取到秒
    imei + "," + logId.substring(0, 14)
  }
  /**
    * 根據正則表示式,查詢相應值
    *
    **/
  def getDataByPattern(p: Regex, line: String): String = {
    val result = (p.findFirstMatchIn(line)).map(item => {
      val s = item group 1 //返回匹配上正則的第一個字串。
      s
    })
    result.getOrElse("NULL")
  }
  /**
    * 根據時間字串獲取時間秒數,單位(秒) 時間戳是指格林威治時間1970年01月01日00時00分00秒(北京時間1970年01月01日08時00分00秒)起至現在的總毫秒數
    * 所以返回時間戳/1000
    **/
  def getTimeByString(timeString: String): Long = {
    val sf: SimpleDateFormat = new SimpleDateFormat("yyyyMMddHHmmss")
    sf.parse(timeString).getTime / 1000
  }
}
我本機是mac pro所以配置檔案有點不一樣


如果你是win7配置檔案可能要改下:


直接看輸出結果吧:


detail:


sum: