1. 程式人生 > >第70課:SparkSQL內建函式解密與實戰學習筆記

第70課:SparkSQL內建函式解密與實戰學習筆記

第70:SparkSQL內建函式解密與實戰學習筆記

本期內容:

1 SparkSQL內建函式解析

2 SparkSQL內建函式實戰

SparkSQLDataFrame引入了大量的內建函式,這些內建函式一般都有CGCodeGeneration)功能,這樣的函式在編譯和執行時都會經過高度優化。

問題:SparkSQL操作HiveHive on spark一樣嗎?

=> 不一樣。SparkSQL操作Hive只是把Hive當作資料倉庫的來源,而計算引擎就是SparkSQL本身。Hive on sparkHive的子專案,Hive on Spark的核心是把Hive的執行引擎換成

Spark。眾所周知,目前Hive的計算引擎是Mapreduce,因為效能低下等問題,所以Hive的官方就想替換這個引擎。

SparkSQL操作Hive上的資料叫Spark on Hive,而Hive on Spark依舊是以Hive為核心,只是把計算引擎由MapReduce替換為Spark

Spark官網上DataFrame API Docs

classDataFrame extends Queryable with Serializable

Experimental

A distributed collection of data organized into named columns.

DataFrame is equivalent to a relational table in Spark SQL. The following example creates a DataFrame by pointing Spark SQL to a Parquet data set.

val people = sqlContext.read.parquet("...")  // in Scala

DataFrame people = sqlContext.read().parquet("...")  // in Java

Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: 

DataFrame (this class), Column, and functions.

To select a column from the data frame, use apply method in Scala and col in Java.

val ageCol = people("age")  // in Scala

Column ageCol = people.col("age")  // in Java

Note that the Column type can also be manipulated through its various functions.

// The following creates a new column that increases everybody's age by 10.

people("age") + 10  // in Scala

people.col("age").plus(10);  // in Java

A more concrete example in Scala:

// To create DataFrame using SQLContextval people = sqlContext.read.parquet("...")val department = sqlContext.read.parquet("...")

people.filter("age > 30")

  .join(department, people("deptId") === department("id"))

  .groupBy(department("name"), "gender")

  .agg(avg(people("salary")), max(people("age")))

and in Java:

// To create DataFrame using SQLContext

DataFrame people = sqlContext.read().parquet("...");

DataFrame department = sqlContext.read().parquet("...");

people.filter("age".gt(30))

  .join(department, people.col("deptId").equalTo(department("id")))

  .groupBy(department.col("name"), "gender")

  .agg(avg(people.col("salary")), max(people.col("age")));

以上內容中的join,groupBy,agg都是SparkSQL的內建函式。

SParkl1.5.x以後推出了很多內建函式,據不完全統計,有一百多個內建函式。

下面實戰開發一個聚合操作的例子:

package com.dt.spark.sql

import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}

import org.apache.spark.sql.{Row, SQLContext}

import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.{SparkContext, SparkConf}

import org.apache.spark.sql.functions._

/**

  * 使用Scala開發叢集執行的Spark WordCount程式

  * @author DT大資料夢工廠

  * 新浪微博:http://weibo.com/ilovepains/

  * Created by hp on 2016/3/28.

  *

  * 使用Spark SQL中的內建函式對資料進行分析,Spark SQL API不同的是,DataFrame中的內建函式操作的結果是返回一個Column物件,而

  * DataFrame天生就是"A distributed collection of data organized into named columns.",這就為資料的複雜分析建立了堅實的基礎

  * 並提供了極大的方便性,例如說,我們在操作DataFrame的方法中可以隨時呼叫內建函式進行業務需要的處理,這之於我們構建附件的業務邏輯而言是可以

  * 極大的減少不必須的時間消耗(基於上就是實際模型的對映),讓我們聚焦在資料分析上,這對於提高工程師的生產力而言是非常有價值的

  * Spark 1.5.x開始提供了大量的內建函式,例如agg

  * def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame = {

  *  groupBy().agg(aggExpr, aggExprs : _*)

  *}

  * 還有maxmeanminsumavgexplodesizesort_arraydayto_dateabsacrosasinatan

  * 總體上而言內建函式包含了五大基本型別:

  * 1,聚合函式,例如countDistinctsumDistinct等;

  * 2,集合函式,例如sort_arrayexplode

  * 3,日期、時間函式,例如hourquarternext_day

  * 4,數學函式,例如asinatansqrttanround等;

  * 5,開窗函式,例如rowNumber

  * 6,字串函式,concatformat_numberrexexp_extract

  * 7,其它函式,isNaNsharandncallUDF

  *

  */

object SparkSQLAgg {

  def main (args: Array[String]) {

    /**

      * 1步:建立Spark的配置物件SparkConf,設定Spark程式的執行時的配置資訊,

      * 例如說通過setMaster來設定程式要連結的Spark叢集的MasterURL,如果設定

      * local,則代表Spark程式在本地執行,特別適合於機器配置條件非常差(例如

      * 只有1G的記憶體)的初學者       *

      */

    val conf = new SparkConf() //建立SparkConf物件

    conf.setAppName("SparkSQLInnerFunctions") //設定應用程式的名稱,在程式執行的監控介面可以看到名稱

    //    conf.setMaster("spark://Master:7077") //此時,程式在Spark叢集

    conf.setMaster("local")

    /**

      * 2步:建立SparkContext物件

      * SparkContextSpark程式所有功能的唯一入口,無論是採用ScalaJavaPythonR等都必須有一個SparkContext

      * SparkContext核心作用:初始化Spark應用程式執行所需要的核心元件,包括DAGSchedulerTaskSchedulerSchedulerBackend

      * 同時還會負責Spark程式往Master註冊程式等

      * SparkContext是整個Spark應用程式中最為至關重要的一個物件

      */

    val sc = new SparkContext(conf) //建立SparkContext物件,通過傳入SparkConf例項來定製Spark執行的具體引數和配置資訊

    val sqlContext = new SQLContext(sc)   //構建SQL上下文

    //要使用Spark SQL的內建函式,就一定要匯入SQLContext下的隱式轉換

    import sqlContext.implicits._

    /**

      * 第三步:模擬電商訪問的資料,實際情況會比模擬資料複雜很多,最後生成RDD

      */

    val userData = Array(

      "2016-3-27,001,http://spark.apache.org/,1000",

      "2016-3-27,001,http://hadoop.apache.org/,1001",

      "2016-3-27,002,http://fink.apache.org/,1002",

      "2016-3-28,003,http://kafka.apache.org/,1020",

      "2016-3-28,004,http://spark.apache.org/,1010",

      "2016-3-28,002,http://hive.apache.org/,1200",

      "2016-3-28,001,http://parquet.apache.org/,1500",

      "2016-3-28,001,http://spark.apache.org/,1800"

    )

    val userDataRDD = sc.parallelize(userData)  //生成DD分散式集合物件

    /**

      * 第四步:根據業務需要對資料進行預處理生成DataFrame,要想把RDD轉換成DataFrame,需要先把RDD中的元素型別變成Row型別

      * 於此同時要提供DataFrame中的Columns的元資料資訊描述

      */

    val userDataRDDRow = userDataRDD.map(row => {val splited = row.split(",") ;Row(splited(0),splited(1).toInt,splited(2),splited(3).toInt)})

    val structTypes = StructType(Array(

      StructField("time", StringType, true),

      StructField("id", IntegerType, true),

      StructField("url", StringType, true),

      StructField("amount", IntegerType, true)

    ))

    val userDataDF = sqlContext.createDataFrame(userDataRDDRow,structTypes)

    /**

      * 第五步:使用Spark SQL提供的內建函式對DataFrame進行操作,特別注意:內建函式生成的Column物件且自定進行CG

      *

      *

      */

    userDataDF.groupBy("time").agg('time, countDistinct('id))

      .map(row=>Row(row(1),row(2))).collect.foreach(println)

    userDataDF.groupBy("time").agg('time, sum('amount)).show()

  }

}

Eclipse中執行如下:

16/04/10 23:54:04 INFO TaskSetManager: Finished task 58.0 in stage 6.0 (TID 461) in 18 ms on localhost (59/199)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:04 INFO Executor: Finished task 59.0 in stage 6.0 (TID 462). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 60.0 in stage 6.0 (TID 463, localhost, partition 61,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO Executor: Running task 60.0 in stage 6.0 (TID 463)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 59.0 in stage 6.0 (TID 462) in 15 ms on localhost (60/199)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 2 ms

16/04/10 23:54:04 INFO Executor: Finished task 60.0 in stage 6.0 (TID 463). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 61.0 in stage 6.0 (TID 464, localhost, partition 62,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 60.0 in stage 6.0 (TID 463) in 17 ms on localhost (61/199)

16/04/10 23:54:04 INFO Executor: Running task 61.0 in stage 6.0 (TID 464)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:04 INFO Executor: Finished task 61.0 in stage 6.0 (TID 464). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 62.0 in stage 6.0 (TID 465, localhost, partition 63,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO Executor: Running task 62.0 in stage 6.0 (TID 465)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 61.0 in stage 6.0 (TID 464) in 99 ms on localhost (62/199)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:04 INFO Executor: Finished task 62.0 in stage 6.0 (TID 465). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 63.0 in stage 6.0 (TID 466, localhost, partition 64,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 62.0 in stage 6.0 (TID 465) in 18 ms on localhost (63/199)

16/04/10 23:54:04 INFO Executor: Running task 63.0 in stage 6.0 (TID 466)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:04 INFO Executor: Finished task 63.0 in stage 6.0 (TID 466). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 64.0 in stage 6.0 (TID 467, localhost, partition 65,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 63.0 in stage 6.0 (TID 466) in 16 ms on localhost (64/199)

16/04/10 23:54:04 INFO Executor: Running task 64.0 in stage 6.0 (TID 467)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:04 INFO Executor: Finished task 64.0 in stage 6.0 (TID 467). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 65.0 in stage 6.0 (TID 468, localhost, partition 66,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO Executor: Running task 65.0 in stage 6.0 (TID 468)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 64.0 in stage 6.0 (TID 467) in 18 ms on localhost (65/199)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:04 INFO Executor: Finished task 65.0 in stage 6.0 (TID 468). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 66.0 in stage 6.0 (TID 469, localhost, partition 67,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 65.0 in stage 6.0 (TID 468) in 47 ms on localhost (66/199)

16/04/10 23:54:04 INFO Executor: Running task 66.0 in stage 6.0 (TID 469)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:04 INFO Executor: Finished task 66.0 in stage 6.0 (TID 469). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 67.0 in stage 6.0 (TID 470, localhost, partition 68,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO Executor: Running task 67.0 in stage 6.0 (TID 470)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 66.0 in stage 6.0 (TID 469) in 17 ms on localhost (67/199)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:04 INFO Executor: Finished task 67.0 in stage 6.0 (TID 470). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 68.0 in stage 6.0 (TID 471, localhost, partition 69,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO Executor: Running task 68.0 in stage 6.0 (TID 471)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 67.0 in stage 6.0 (TID 470) in 11 ms on localhost (68/199)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:04 INFO Executor: Finished task 68.0 in stage 6.0 (TID 471). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 69.0 in stage 6.0 (TID 472, localhost, partition 70,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO Executor: Running task 69.0 in stage 6.0 (TID 472)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 68.0 in stage 6.0 (TID 471) in 21 ms on localhost (69/199)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:04 INFO Executor: Finished task 69.0 in stage 6.0 (TID 472). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 70.0 in stage 6.0 (TID 473, localhost, partition 71,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO Executor: Running task 70.0 in stage 6.0 (TID 473)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 69.0 in stage 6.0 (TID 472) in 15 ms on localhost (70/199)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 6 ms

16/04/10 23:54:04 INFO Executor: Finished task 70.0 in stage 6.0 (TID 473). 1652 bytes result sent to driver

16/04/10 23:54:04 INFO TaskSetManager: Starting task 71.0 in stage 6.0 (TID 474, localhost, partition 72,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:04 INFO TaskSetManager: Finished task 70.0 in stage 6.0 (TID 473) in 50 ms on localhost (71/199)

16/04/10 23:54:04 INFO Executor: Running task 71.0 in stage 6.0 (TID 474)

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 71.0 in stage 6.0 (TID 474). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 72.0 in stage 6.0 (TID 475, localhost, partition 73,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 71.0 in stage 6.0 (TID 474) in 42 ms on localhost (72/199)

16/04/10 23:54:05 INFO Executor: Running task 72.0 in stage 6.0 (TID 475)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 72.0 in stage 6.0 (TID 475). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 73.0 in stage 6.0 (TID 476, localhost, partition 74,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 72.0 in stage 6.0 (TID 475) in 102 ms on localhost (73/199)

16/04/10 23:54:05 INFO Executor: Running task 73.0 in stage 6.0 (TID 476)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 73.0 in stage 6.0 (TID 476). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 74.0 in stage 6.0 (TID 477, localhost, partition 75,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 73.0 in stage 6.0 (TID 476) in 32 ms on localhost (74/199)

16/04/10 23:54:05 INFO Executor: Running task 74.0 in stage 6.0 (TID 477)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 74.0 in stage 6.0 (TID 477). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 75.0 in stage 6.0 (TID 478, localhost, partition 76,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 74.0 in stage 6.0 (TID 477) in 66 ms on localhost (75/199)

16/04/10 23:54:05 INFO Executor: Running task 75.0 in stage 6.0 (TID 478)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 75.0 in stage 6.0 (TID 478). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 76.0 in stage 6.0 (TID 479, localhost, partition 77,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 75.0 in stage 6.0 (TID 478) in 56 ms on localhost (76/199)

16/04/10 23:54:05 INFO Executor: Running task 76.0 in stage 6.0 (TID 479)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 76.0 in stage 6.0 (TID 479). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 77.0 in stage 6.0 (TID 480, localhost, partition 78,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 76.0 in stage 6.0 (TID 479) in 15 ms on localhost (77/199)

16/04/10 23:54:05 INFO Executor: Running task 77.0 in stage 6.0 (TID 480)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 77.0 in stage 6.0 (TID 480). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 78.0 in stage 6.0 (TID 481, localhost, partition 79,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 78.0 in stage 6.0 (TID 481)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 77.0 in stage 6.0 (TID 480) in 15 ms on localhost (78/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 78.0 in stage 6.0 (TID 481). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 79.0 in stage 6.0 (TID 482, localhost, partition 80,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 78.0 in stage 6.0 (TID 481) in 54 ms on localhost (79/199)

16/04/10 23:54:05 INFO Executor: Running task 79.0 in stage 6.0 (TID 482)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 79.0 in stage 6.0 (TID 482). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 80.0 in stage 6.0 (TID 483, localhost, partition 81,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 80.0 in stage 6.0 (TID 483)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 79.0 in stage 6.0 (TID 482) in 19 ms on localhost (80/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 80.0 in stage 6.0 (TID 483). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 81.0 in stage 6.0 (TID 484, localhost, partition 82,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 81.0 in stage 6.0 (TID 484)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 80.0 in stage 6.0 (TID 483) in 19 ms on localhost (81/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 81.0 in stage 6.0 (TID 484). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 82.0 in stage 6.0 (TID 485, localhost, partition 83,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 82.0 in stage 6.0 (TID 485)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 81.0 in stage 6.0 (TID 484) in 14 ms on localhost (82/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 82.0 in stage 6.0 (TID 485). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 83.0 in stage 6.0 (TID 486, localhost, partition 84,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 82.0 in stage 6.0 (TID 485) in 79 ms on localhost (83/199)

16/04/10 23:54:05 INFO Executor: Running task 83.0 in stage 6.0 (TID 486)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 83.0 in stage 6.0 (TID 486). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 84.0 in stage 6.0 (TID 487, localhost, partition 85,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 84.0 in stage 6.0 (TID 487)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 83.0 in stage 6.0 (TID 486) in 31 ms on localhost (84/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 84.0 in stage 6.0 (TID 487). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 85.0 in stage 6.0 (TID 488, localhost, partition 86,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 84.0 in stage 6.0 (TID 487) in 26 ms on localhost (85/199)

16/04/10 23:54:05 INFO Executor: Running task 85.0 in stage 6.0 (TID 488)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 85.0 in stage 6.0 (TID 488). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 86.0 in stage 6.0 (TID 489, localhost, partition 87,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 86.0 in stage 6.0 (TID 489)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 85.0 in stage 6.0 (TID 488) in 14 ms on localhost (86/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 86.0 in stage 6.0 (TID 489). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 87.0 in stage 6.0 (TID 490, localhost, partition 88,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 86.0 in stage 6.0 (TID 489) in 48 ms on localhost (87/199)

16/04/10 23:54:05 INFO Executor: Running task 87.0 in stage 6.0 (TID 490)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 87.0 in stage 6.0 (TID 490). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 88.0 in stage 6.0 (TID 491, localhost, partition 89,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 88.0 in stage 6.0 (TID 491)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 87.0 in stage 6.0 (TID 490) in 20 ms on localhost (88/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO GenerateMutableProjection: Code generated in 136.381588 ms

16/04/10 23:54:05 INFO Executor: Finished task 88.0 in stage 6.0 (TID 491). 2032 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 89.0 in stage 6.0 (TID 492, localhost, partition 90,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 88.0 in stage 6.0 (TID 491) in 308 ms on localhost (89/199)

16/04/10 23:54:05 INFO Executor: Running task 89.0 in stage 6.0 (TID 492)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 89.0 in stage 6.0 (TID 492). 2032 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 90.0 in stage 6.0 (TID 493, localhost, partition 91,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 90.0 in stage 6.0 (TID 493)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 89.0 in stage 6.0 (TID 492) in 45 ms on localhost (90/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 90.0 in stage 6.0 (TID 493). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 91.0 in stage 6.0 (TID 494, localhost, partition 92,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO Executor: Running task 91.0 in stage 6.0 (TID 494)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 90.0 in stage 6.0 (TID 493) in 24 ms on localhost (91/199)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 91.0 in stage 6.0 (TID 494). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 92.0 in stage 6.0 (TID 495, localhost, partition 93,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 91.0 in stage 6.0 (TID 494) in 14 ms on localhost (92/199)

16/04/10 23:54:05 INFO Executor: Running task 92.0 in stage 6.0 (TID 495)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:05 INFO Executor: Finished task 92.0 in stage 6.0 (TID 495). 1652 bytes result sent to driver

16/04/10 23:54:05 INFO TaskSetManager: Starting task 93.0 in stage 6.0 (TID 496, localhost, partition 94,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:05 INFO TaskSetManager: Finished task 92.0 in stage 6.0 (TID 495) in 19 ms on localhost (93/199)

16/04/10 23:54:05 INFO Executor: Running task 93.0 in stage 6.0 (TID 496)

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:05 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:05 INFO Executor: Finished task 93.0 in stage 6.0 (TID 496). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 94.0 in stage 6.0 (TID 497, localhost, partition 95,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 93.0 in stage 6.0 (TID 496) in 28 ms on localhost (94/199)

16/04/10 23:54:06 INFO Executor: Running task 94.0 in stage 6.0 (TID 497)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 94.0 in stage 6.0 (TID 497). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 95.0 in stage 6.0 (TID 498, localhost, partition 96,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO Executor: Running task 95.0 in stage 6.0 (TID 498)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 94.0 in stage 6.0 (TID 497) in 113 ms on localhost (95/199)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 95.0 in stage 6.0 (TID 498). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 96.0 in stage 6.0 (TID 499, localhost, partition 97,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 95.0 in stage 6.0 (TID 498) in 42 ms on localhost (96/199)

16/04/10 23:54:06 INFO Executor: Running task 96.0 in stage 6.0 (TID 499)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 96.0 in stage 6.0 (TID 499). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 97.0 in stage 6.0 (TID 500, localhost, partition 98,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO Executor: Running task 97.0 in stage 6.0 (TID 500)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 96.0 in stage 6.0 (TID 499) in 23 ms on localhost (97/199)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 97.0 in stage 6.0 (TID 500). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 98.0 in stage 6.0 (TID 501, localhost, partition 99,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO Executor: Running task 98.0 in stage 6.0 (TID 501)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 97.0 in stage 6.0 (TID 500) in 14 ms on localhost (98/199)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 98.0 in stage 6.0 (TID 501). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 99.0 in stage 6.0 (TID 502, localhost, partition 100,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO Executor: Running task 99.0 in stage 6.0 (TID 502)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 98.0 in stage 6.0 (TID 501) in 21 ms on localhost (99/199)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 99.0 in stage 6.0 (TID 502). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 100.0 in stage 6.0 (TID 503, localhost, partition 101,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO Executor: Running task 100.0 in stage 6.0 (TID 503)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 99.0 in stage 6.0 (TID 502) in 11 ms on localhost (100/199)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 100.0 in stage 6.0 (TID 503). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 101.0 in stage 6.0 (TID 504, localhost, partition 102,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO Executor: Running task 101.0 in stage 6.0 (TID 504)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 100.0 in stage 6.0 (TID 503) in 12 ms on localhost (101/199)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 101.0 in stage 6.0 (TID 504). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 102.0 in stage 6.0 (TID 505, localhost, partition 103,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO Executor: Running task 102.0 in stage 6.0 (TID 505)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 101.0 in stage 6.0 (TID 504) in 10 ms on localhost (102/199)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

16/04/10 23:54:06 INFO Executor: Finished task 102.0 in stage 6.0 (TID 505). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 103.0 in stage 6.0 (TID 506, localhost, partition 104,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 102.0 in stage 6.0 (TID 505) in 42 ms on localhost (103/199)

16/04/10 23:54:06 INFO Executor: Running task 103.0 in stage 6.0 (TID 506)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 103.0 in stage 6.0 (TID 506). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 104.0 in stage 6.0 (TID 507, localhost, partition 105,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INFO Executor: Running task 104.0 in stage 6.0 (TID 507)

16/04/10 23:54:06 INFO TaskSetManager: Finished task 103.0 in stage 6.0 (TID 506) in 19 ms on localhost (104/199)

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks

16/04/10 23:54:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms

16/04/10 23:54:06 INFO Executor: Finished task 104.0 in stage 6.0 (TID 507). 1652 bytes result sent to driver

16/04/10 23:54:06 INFO TaskSetManager: Starting task 105.0 in stage 6.0 (TID 508, localhost, partition 106,NODE_LOCAL, 1999 bytes)

16/04/10 23:54:06 INF