1. 程式人生 > >spark-shell 資料檔案 讀成 表 的兩種方式!!! 相對路徑!!hdfs dfs -ls

spark-shell 資料檔案 讀成 表 的兩種方式!!! 相對路徑!!hdfs dfs -ls

park SQL應用
Spark Shell啟動後,就可以用Spark SQL API執行資料分析查詢。

在第一個示例中,我們將從文字檔案中載入使用者資料並從資料集中建立一個DataFrame物件。然後執行DataFrame函式,執行特定的資料選擇查詢。

文字檔案customers.txt中的內容如下:

100, John Smith, Austin, TX, 78727
200, Joe Johnson, Dallas, TX, 75201
300, Bob Jones, Houston, TX, 77028
400, Andy Davis, San Antonio, TX, 78227
500, James Williams, Austin, TX, 78727
下述程式碼片段展示了可以在Spark Shell終端執行的Spark SQL命令。

// 首先用已有的Spark Context物件建立SQLContext物件
val sqlContext = new org.apache.spark.sql.SQLContext(sc);

// 匯入語句,可以隱式地將RDD轉化成DataFrame
import sqlContext.implicits._

// 建立一個表示客戶的自定義類
case class Customer(customer_id: Int, name: String, city: String, state: String, zip_code: String)

// 用資料集文字檔案建立一個Customer物件的DataFrame
val dfCustomers = sc.textFile("data/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();

// 將DataFrame註冊為一個表
dfCustomers.registerTempTable("customers");

// 顯示DataFrame的內容
dfCustomers.show();

// 列印DF模式
dfCustomers.printSchema();

// 選擇客戶名稱列
dfCustomers.select("name").show();

// 選擇客戶名稱和城市列
dfCustomers.select("name", "city").show()

// 根據id選擇客戶
dfCustomers.filter(dfCustomers("customer_id").equalTo(500)).show();

// 根據郵政編碼統計客戶數量
dfCustomers.groupBy("zip_code").count().show();



使用新的資料型別類StructType,StringType和StructField指定模式。

//
// 用程式設計的方式指定模式
//

// 用已有的Spark Context物件建立SQLContext物件
val sqlContext = new org.apache.spark.sql.SQLContext(sc);

// 建立RDD物件
val rddCustomers = sc.textFile("data/customers.txt");

// 用字串編碼模式
val schemaString = "customer_id name city state zip_code";

// 匯入Spark SQL資料型別和Row
import org.apache.spark.sql._

import org.apache.spark.sql.types._;

// 用模式字串生成模式物件
val schema = StructType(schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)));

// 將RDD(rddCustomers)記錄轉化成Row。
val rowRDD = rddCustomers.map(_.split(",")).map(p => Row(p(0).trim,p(1),p(2),p(3),p(4)));

// 將模式應用於RDD物件。
val dfCustomers = sqlContext.createDataFrame(rowRDD, schema);

// 將DataFrame註冊為表
dfCustomers.registerTempTable("customers");

// 用sqlContext物件提供的sql方法執行SQL語句。
val custNames = sqlContext.sql("SELECT name FROM customers");

// SQL查詢的返回結果為DataFrame物件,支援所有通用的RDD操作。
// 可以按照順序訪問結果行的各個列。
custNames.map(t => "Name: " + t(0)).collect().foreach(println);

// 用sqlContext物件提供的sql方法執行SQL語句。
val customersByCity = sqlContext.sql("SELECT name,zip_code FROM customers ORDER BY zip_code");

// SQL查詢的返回結果為DataFrame物件,支援所有通用的RDD操作。
// 可以按照順序訪問結果行的各個列。
customersByCity.map(t => t(0) + "," + t(1)).collect().foreach(println);
除了文字檔案之外,也可以從其他資料來源中載入資料,如JSON資料檔案,Hive表,甚至可以通過JDBC資料來源載入關係型資料庫表中的資料

<pre name="code" class="sql">Last login: Tue Jul 19 23:21:05 2016 from 192.168.3.103
[[email protected] ~]# jps
23852 SparkSubmit
19801 NameNode
23932 Jps
20330 NodeManager
22342 Master
22509 Worker
20231 ResourceManager
20082 SecondaryNameNode
19898 DataNode
[[email protected] ~]# hdfs dfs -ls /user/
Found 2 items
drwxr-xr-x   - root supergroup          0 2016-07-19 21:16 /user/hive
drwxr-xr-x   - root supergroup          0 2016-07-19 23:07 /user/test
[
[email protected]
~]# hdfs dfs -ls /user/test [[email protected] ~]# cd /user/test [[email protected] test]# ll total 0 [[email protected] test]# vim customers.txt [[email protected] test]# ll total 4 -rw-r--r--. 1 root root 185 Jul 19 23:25 customers.txt [[email protected] test]# cat customers.txt 100, John Smith, Austin, TX, 78727 200, Joe Johnson, Dallas, TX, 75201 300, Bob Jones, Houston, TX, 77028 400, Andy Davis, San Antonio, TX, 78227 500, James Williams, Austin, TX, 78727 [
[email protected]
test]# hdfs dfs -ls /user/test [[email protected] test]# hdfs dfs -put /user/test/customers.txt /user/test [[email protected] test]# hdfs dfs -ls /user/test Found 1 items -rw-r--r-- 1 root supergroup 185 2016-07-19 23:26 /user/test/customers.txt [[email protected] test]# hdfs dfs -ls /user/test Found 1 items -rw-r--r-- 1 root supergroup 185 2016-07-19 23:26 /user/test/customers.txt [[email protected] test]# hdfs dfs -ls / Found 2 items drwxr-xr-x - root supergroup 0 2016-07-19 21:31 /tmp drwxr-xr-x - root supergroup 0 2016-07-19 23:07 /user [[email protected] test]#

Last login: Tue Jul 19 23:23:11 2016 from 192.168.3.103
[[email protected] ~]# cd /user/local/spark-1.4.0-bin-hadoop2.6/
[[email protected] spark-1.4.0-bin-hadoop2.6]# ll
total 684
drwxr-xr-x. 2 yc   yc     4096 Jun  3  2015 bin
-rw-r--r--. 1 yc   yc   561149 Jun  3  2015 CHANGES.txt
drwxr-xr-x. 2 yc   yc     4096 Jun 18 00:03 conf
drwxr-xr-x. 3 yc   yc     4096 Jun  3  2015 data
drwxr-xr-x. 3 yc   yc     4096 Jun  3  2015 ec2
drwxr-xr-x. 3 yc   yc     4096 Jun  3  2015 examples
drwxr-xr-x. 2 yc   yc     4096 Jun  3  2015 lib
-rw-r--r--. 1 yc   yc    50902 Jun  3  2015 LICENSE
drwxr-xr-x. 2 root root   4096 Jul 19 22:35 logs
-rw-r--r--. 1 yc   yc    22559 Jun  3  2015 NOTICE
drwxr-xr-x. 6 yc   yc     4096 Jun  3  2015 python
drwxr-xr-x. 3 yc   yc     4096 Jun  3  2015 R
-rw-r--r--. 1 yc   yc     3624 Jun  3  2015 README.md
-rw-r--r--. 1 yc   yc      134 Jun  3  2015 RELEASE
drwxr-xr-x. 2 yc   yc     4096 Jul 19 22:39 sbin
drwxr-xr-x. 2 root root   4096 Jun 18 00:03 work
[[email protected] spark-1.4.0-bin-hadoop2.6]# cd work/
[[email protected] work]# ll
total 0
[[email protected] work]# cd ..
[[email protected] spark-1.4.0-bin-hadoop2.6]# cd data/
[[email protected] data]# ll
total 4
drwxr-xr-x. 5 yc yc 4096 Jun  3  2015 mllib
[[email protected] data]# cd mllib/
[[email protected] mllib]# ll
total 828
drwxr-xr-x. 2 yc yc   4096 Jun  3  2015 als
-rw-r--r--. 1 yc yc  63973 Jun  3  2015 gmm_data.txt
-rw-r--r--. 1 yc yc     72 Jun  3  2015 kmeans_data.txt
drwxr-xr-x. 2 yc yc   4096 Jun  3  2015 lr-data
-rw-r--r--. 1 yc yc 197105 Jun  3  2015 lr_data.txt
-rw-r--r--. 1 yc yc     24 Jun  3  2015 pagerank_data.txt
drwxr-xr-x. 2 yc yc   4096 Jun  3  2015 ridge-data
-rw-r--r--. 1 yc yc 104736 Jun  3  2015 sample_binary_classification_data.txt
-rw-r--r--. 1 yc yc     68 Jun  3  2015 sample_fpgrowth.txt
-rw-r--r--. 1 yc yc   1598 Jun  3  2015 sample_isotonic_regression_data.txt
-rw-r--r--. 1 yc yc    264 Jun  3  2015 sample_lda_data.txt
-rw-r--r--. 1 yc yc 104736 Jun  3  2015 sample_libsvm_data.txt
-rwxr-xr-x. 1 yc yc 119069 Jun  3  2015 sample_linear_regression_data.txt
-rw-r--r--. 1 yc yc  14351 Jun  3  2015 sample_movielens_data.txt
-rw-r--r--. 1 yc yc   6953 Jun  3  2015 sample_multiclass_classification_data.txt
-rw-r--r--. 1 yc yc     48 Jun  3  2015 sample_naive_bayes_data.txt
-rw-r--r--. 1 yc yc  39474 Jun  3  2015 sample_svm_data.txt
-rw-r--r--. 1 yc yc 115476 Jun  3  2015 sample_tree_data.csv
[[email protected] mllib]# cd ..
[[email protected] data]# ll
total 4
drwxr-xr-x. 5 yc yc 4096 Jun  3  2015 mllib
[[email protected] data]# pwd
/user/local/spark-1.4.0-bin-hadoop2.6/data
[[email protected] data]# ls
mllib
[[email protected] data]# vim customers.txt

[1]+  Stopped                 vim customers.txt
[[email protected] data]# vim customers.txt
[[email protected] data]# ls
customers.txt  mllib
[[email protected] data]# ls -l
total 8
-rw-r--r--. 1 root root  185 Jul 19 23:44 customers.txt
drwxr-xr-x. 5 yc   yc   4096 Jun  3  2015 mllib
[[email protected] data]# hdfs dfs -l /user/root/data/
-l: Unknown command
[[email protected] data]# hdfs dfs -ls /user/root/data/
ls: `/user/root/data/': No such file or directory
[[email protected] data]# hdfs dfs -ls /user/
Found 2 items
drwxr-xr-x   - root supergroup          0 2016-07-19 21:16 /user/hive
drwxr-xr-x   - root supergroup          0 2016-07-19 23:26 /user/test
[[email protected] data]# hdfs dfs -mkdir /user/root/data/
mkdir: `/user/root/data/': No such file or directory
[[email protected] data]# hdfs dfs -mkdir -p /user/root/data/
[[email protected] data]# hdfs dfs -ls /user/root/data/
[[email protected] data]# hdfs dfs -ls /user/root
Found 1 items
drwxr-xr-x   - root supergroup          0 2016-07-19 23:47 /user/root/data
[[email protected] data]# pwd
/user/local/spark-1.4.0-bin-hadoop2.6/data
[[email protected] data]# ll
total 8
-rw-r--r--. 1 root root  185 Jul 19 23:44 customers.txt
drwxr-xr-x. 5 yc   yc   4096 Jun  3  2015 mllib
[[email protected] data]# hdfs dfs -put customers.txt /user/root/data
[[email protected] data]# hdfs dfs -ls /user/root/data
Found 1 items
-rw-r--r--   1 root supergroup        185 2016-07-19 23:48 /user/root/data/customers.txt
[[email protected]dh1 data]# hdfs dfs -text /user/root/data/customers.txt
100, John Smith, Austin, TX, 78727
200, Joe Johnson, Dallas, TX, 75201
300, Bob Jones, Houston, TX, 77028
400, Andy Davis, San Antonio, TX, 78227
500, James Williams, Austin, TX, 78727
[[email protected] data]# hdfs dfs -cat /user/root/data/customers.txt
100, John Smith, Austin, TX, 78727
200, Joe Johnson, Dallas, TX, 75201
300, Bob Jones, Houston, TX, 77028
400, Andy Davis, San Antonio, TX, 78227
500, James Williams, Austin, TX, 78727
[[email protected] data]# hdfs dfs -mv /user/root/data/customers.txt /user/root/data/customer.tx
[[email protected] data]# hdfs dfs -ls /user/root/data
Found 1 items
-rw-r--r--   1 root supergroup        185 2016-07-19 23:48 /user/root/data/customer.tx
[[email protected] data]# 

[BEGIN] 2016/7/20 22:46:49
dfCustomers.groupBy("zip_code").count().show();filter(dfCustomers("customer_id").equalTo(500)).show();select("name", "city").show()).show();printSchema();show();registerTempTable("customers");val dfCustomers = sc.textFile("/user/test/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();data/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();/user/test/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();data/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();case class Customer(customer_id: Int, name: String, city: String, state: String, zip_code: String)import sqlContext.implicits._val sqlContext = new org.apache.spark.sql.SQLContext(sc);textFile = sc.textFile("/user/test/customers.txt").collect();customers.txt").collect();
16/07/19 23:33:01 INFO storage.MemoryStore: ensureFreeSpace(222752) called with curMem=896994, maxMem=278019440
16/07/19 23:33:01 INFO storage.MemoryStore: Block broadcast_19 stored as values in memory (estimated size 217.5 KB, free 264.1 MB)
16/07/19 23:33:01 INFO storage.MemoryStore: ensureFreeSpace(19999) called with curMem=1119746, maxMem=278019440
16/07/19 23:33:01 INFO storage.MemoryStore: Block broadcast_19_piece0 stored as bytes in memory (estimated size 19.5 KB, free 264.1 MB)
16/07/19 23:33:01 INFO storage.BlockManagerInfo: Added broadcast_19_piece0 in memory on localhost:56137 (size: 19.5 KB, free: 265.0 MB)
16/07/19 23:33:01 INFO spark.SparkContext: Created broadcast 19 from textFile at <console>:26
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://cdh1:9000/user/root/customers.txt
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1779)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:885)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:884)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
	at $iwC$$iwC$$iwC.<init>(<console>:43)
	at $iwC$$iwC.<init>(<console>:45)
	at $iwC.<init>(<console>:47)
	at <init>(<console>:49)
	at .<init>(<console>:53)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


scala> val textFile = sc.textFile("customers.txt").collect();dfCustomers.groupBy("zip_code").count().show();filter(dfCustomers("customer_id").equalTo(500)).show();select("name", "city").show()).show();printSchema();show();registerTempTable("customers");val dfCustomers = sc.textFile("/user/test/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();data/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();case class Customer(customer_id: Int, name: String, city: String, state: String, zip_code: String)import sqlContext.implicits._val sqlContext = new org.apache.spark.sql.SQLContext(sc);textFile = sc.textFile("/user/test/customers.txt").collect();
16/07/19 23:33:16 INFO storage.BlockManagerInfo: Removed broadcast_19_piece0 on localhost:56137 in memory (size: 19.5 KB, free: 265.0 MB)
16/07/19 23:33:16 INFO storage.MemoryStore: ensureFreeSpace(222752) called with curMem=896994, maxMem=278019440
16/07/19 23:33:16 INFO storage.MemoryStore: Block broadcast_20 stored as values in memory (estimated size 217.5 KB, free 264.1 MB)
16/07/19 23:33:16 INFO storage.MemoryStore: ensureFreeSpace(19999) called with curMem=1119746, maxMem=278019440
16/07/19 23:33:16 INFO storage.MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 19.5 KB, free 264.1 MB)
16/07/19 23:33:16 INFO storage.BlockManagerInfo: Added broadcast_20_piece0 in memory on localhost:56137 (size: 19.5 KB, free: 265.0 MB)
16/07/19 23:33:16 INFO spark.SparkContext: Created broadcast 20 from textFile at <console>:26
16/07/19 23:33:16 INFO mapred.FileInputFormat: Total input paths to process : 1
16/07/19 23:33:16 INFO spark.SparkContext: Starting job: collect at <console>:26
16/07/19 23:33:16 INFO scheduler.DAGScheduler: Got job 13 (collect at <console>:26) with 2 output partitions (allowLocal=false)
16/07/19 23:33:16 INFO scheduler.DAGScheduler: Final stage: ResultStage 17(collect at <console>:26)
16/07/19 23:33:16 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:33:16 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:33:16 INFO scheduler.DAGScheduler: Submitting ResultStage 17 (MapPartitionsRDD[38] at textFile at <console>:26), which has no missing parents
16/07/19 23:33:16 INFO storage.MemoryStore: ensureFreeSpace(3128) called with curMem=1139745, maxMem=278019440
16/07/19 23:33:16 INFO storage.MemoryStore: Block broadcast_21 stored as values in memory (estimated size 3.1 KB, free 264.1 MB)
16/07/19 23:33:16 INFO storage.MemoryStore: ensureFreeSpace(1795) called with curMem=1142873, maxMem=278019440
16/07/19 23:33:16 INFO storage.MemoryStore: Block broadcast_21_piece0 stored as bytes in memory (estimated size 1795.0 B, free 264.0 MB)
16/07/19 23:33:16 INFO storage.BlockManagerInfo: Added broadcast_21_piece0 in memory on localhost:56137 (size: 1795.0 B, free: 265.0 MB)
16/07/19 23:33:16 INFO spark.SparkContext: Created broadcast 21 from broadcast at DAGScheduler.scala:874
16/07/19 23:33:16 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 17 (MapPartitionsRDD[38] at textFile at <console>:26)
16/07/19 23:33:16 INFO scheduler.TaskSchedulerImpl: Adding task set 17.0 with 2 tasks
16/07/19 23:33:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 17.0 (TID 414, localhost, ANY, 1413 bytes)
16/07/19 23:33:16 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 17.0 (TID 415, localhost, ANY, 1413 bytes)
16/07/19 23:33:16 INFO executor.Executor: Running task 0.0 in stage 17.0 (TID 414)
16/07/19 23:33:16 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:0+92
16/07/19 23:33:16 INFO executor.Executor: Running task 1.0 in stage 17.0 (TID 415)
16/07/19 23:33:16 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:92+93
16/07/19 23:33:16 INFO executor.Executor: Finished task 1.0 in stage 17.0 (TID 415). 1875 bytes result sent to driver
16/07/19 23:33:16 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 17.0 (TID 415) in 28 ms on localhost (1/2)
16/07/19 23:33:16 INFO executor.Executor: Finished task 0.0 in stage 17.0 (TID 414). 1904 bytes result sent to driver
16/07/19 23:33:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 17.0 (TID 414) in 30 ms on localhost (2/2)
16/07/19 23:33:16 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks have all completed, from pool 
16/07/19 23:33:16 INFO scheduler.DAGScheduler: ResultStage 17 (collect at <console>:26) finished in 0.030 s
16/07/19 23:33:16 INFO scheduler.DAGScheduler: Job 13 finished: collect at <console>:26, took 0.041509 s
textFile: Array[String] = Array(100, John Smith, Austin, TX, 78727, 200, Joe Johnson, Dallas, TX, 75201, 300, Bob Jones, Houston, TX, 77028, 400, Andy Davis, San Antonio, TX, 78227, 500, James Williams, Austin, TX, 78727)

scala> val textFile = sc.textFile("/user/test/customers.txt").collect();customers.txt").collect();dfCustomers.groupBy("zip_code").count().show();filter(dfCustomers("customer_id").equalTo(500)).show();select("name", "city").show()).show();printSchema();show();registerTempTable("customers");val dfCustomers = sc.textFile("/user/test/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();data/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();case class Customer(customer_id: Int, name: String, city: String, state: String, zip_code: String)import sqlContext.implicits._val sqlContext = new org.apache.spark.sql.SQLContext(sc);textFile = sc.textFile("/user/test/customers.txt").collect();sqlContext = new org.apache.spark.sql.SQLContext(sc);
sqlContext: org.apache.spark.sql.SQLContext = [email protected]

scala> 
(reverse-i-search)`': 
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc);textFile = sc.textFile("/user/test/customers.txt").collect();sqlContext = new org.apache.spark.sql.SQLContext(sc);val sqlContext = new org.apache.spark.sql.SQLContext(sc);textFile = sc.textFile("/user/test/customers.txt").collect();customers.txt").collect();dfCustomers.groupBy("zip_code").count().show();filter(dfCustomers("customer_id").equalTo(500)).show();select("name", "city").show()).show();printSchema();show();registerTempTable("customers");val dfCustomers = sc.textFile("/user/test/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();data/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();case class Customer(customer_id: Int, name: String, city: String, state: String, zip_code: String)import sqlContext.implicits._val sqlContext = new org.apache.spark.sql.SQLContext(sc);import sqlContext.implicits._
import sqlContext.implicits._

scala> import sqlContext.implicits._val sqlContext = new org.apache.spark.sql.SQLContext(sc);textFile = sc.textFile("/user/test/customers.txt").collect();customers.txt").collect();dfCustomers.groupBy("zip_code").count().show();filter(dfCustomers("customer_id").equalTo(500)).show();select("name", "city").show()).show();printSchema();show();registerTempTable("customers");val dfCustomers = sc.textFile("/user/test/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();data/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();case class Customer(customer_id: Int, name: String, city: String, state: String, zip_code: String)import sqlContext.implicits._case class Customer(customer_id: Int, name: String, city: String, state: String, zip_code: String)
defined class Customer

scala> case class Customer(customer_id: Int, name: String, city: String, state: String, zip_code: String)import sqlContext.implicits._val sqlContext = new org.apache.spark.sql.SQLContext(sc);textFile = sc.textFile("/user/test/customers.txt").collect();customers.txt").collect();dfCustomers.groupBy("zip_code").count().show();filter(dfCustomers("customer_id").equalTo(500)).show();select("name", "city").show()).show();printSchema();show();registerTempTable("customers");val dfCustomers = sc.textFile("/user/test/customers.txt").map(_.split(",")).map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF();
16/07/19 23:34:33 INFO storage.BlockManagerInfo: Removed broadcast_21_piece0 on localhost:56137 in memory (size: 1795.0 B, free: 265.0 MB)
16/07/19 23:34:33 INFO storage.BlockManagerInfo: Removed broadcast_20_piece0 on localhost:56137 in memory (size: 19.5 KB, free: 265.0 MB)
16/07/19 23:34:33 INFO storage.MemoryStore: ensureFreeSpace(222752) called with curMem=896994, maxMem=278019440
16/07/19 23:34:33 INFO storage.MemoryStore: Block broadcast_22 stored as values in memory (estimated size 217.5 KB, free 264.1 MB)
16/07/19 23:34:33 INFO storage.MemoryStore: ensureFreeSpace(19999) called with curMem=1119746, maxMem=278019440
16/07/19 23:34:33 INFO storage.MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated size 19.5 KB, free 264.1 MB)
16/07/19 23:34:33 INFO storage.BlockManagerInfo: Added broadcast_22_piece0 in memory on localhost:56137 (size: 19.5 KB, free: 265.0 MB)
16/07/19 23:34:33 INFO spark.SparkContext: Created broadcast 22 from textFile at <console>:33
dfCustomers: org.apache.spark.sql.DataFrame = [customer_id: int, name: string, city: string, state: string, zip_code: string]

scala> dfCustomers.registerTempTable("customers");

scala> dfCustomers.show();
16/07/19 23:34:55 INFO mapred.FileInputFormat: Total input paths to process : 1
16/07/19 23:34:55 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Got job 14 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 18(show at <console>:36)
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Submitting ResultStage 18 (MapPartitionsRDD[44] at show at <console>:36), which has no missing parents
16/07/19 23:34:55 INFO storage.MemoryStore: ensureFreeSpace(4088) called with curMem=1139745, maxMem=278019440
16/07/19 23:34:55 INFO storage.MemoryStore: Block broadcast_23 stored as values in memory (estimated size 4.0 KB, free 264.0 MB)
16/07/19 23:34:55 INFO storage.MemoryStore: ensureFreeSpace(2227) called with curMem=1143833, maxMem=278019440
16/07/19 23:34:55 INFO storage.MemoryStore: Block broadcast_23_piece0 stored as bytes in memory (estimated size 2.2 KB, free 264.0 MB)
16/07/19 23:34:55 INFO storage.BlockManagerInfo: Added broadcast_23_piece0 in memory on localhost:56137 (size: 2.2 KB, free: 265.0 MB)
16/07/19 23:34:55 INFO spark.SparkContext: Created broadcast 23 from broadcast at DAGScheduler.scala:874
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 18 (MapPartitionsRDD[44] at show at <console>:36)
16/07/19 23:34:55 INFO scheduler.TaskSchedulerImpl: Adding task set 18.0 with 1 tasks
16/07/19 23:34:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 18.0 (TID 416, localhost, ANY, 1413 bytes)
16/07/19 23:34:55 INFO executor.Executor: Running task 0.0 in stage 18.0 (TID 416)
16/07/19 23:34:55 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:0+92
16/07/19 23:34:55 INFO executor.Executor: Finished task 0.0 in stage 18.0 (TID 416). 2420 bytes result sent to driver
16/07/19 23:34:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 18.0 (TID 416) in 28 ms on localhost (1/1)
16/07/19 23:34:55 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 18.0, whose tasks have all completed, from pool 
16/07/19 23:34:55 INFO scheduler.DAGScheduler: ResultStage 18 (show at <console>:36) finished in 0.027 s
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Job 14 finished: show at <console>:36, took 0.057418 s
16/07/19 23:34:55 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Got job 15 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 19(show at <console>:36)
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Submitting ResultStage 19 (MapPartitionsRDD[44] at show at <console>:36), which has no missing parents
16/07/19 23:34:55 INFO storage.MemoryStore: ensureFreeSpace(4088) called with curMem=1146060, maxMem=278019440
16/07/19 23:34:55 INFO storage.MemoryStore: Block broadcast_24 stored as values in memory (estimated size 4.0 KB, free 264.0 MB)
16/07/19 23:34:55 INFO storage.MemoryStore: ensureFreeSpace(2227) called with curMem=1150148, maxMem=278019440
16/07/19 23:34:55 INFO storage.MemoryStore: Block broadcast_24_piece0 stored as bytes in memory (estimated size 2.2 KB, free 264.0 MB)
16/07/19 23:34:55 INFO storage.BlockManagerInfo: Added broadcast_24_piece0 in memory on localhost:56137 (size: 2.2 KB, free: 265.0 MB)
16/07/19 23:34:55 INFO spark.SparkContext: Created broadcast 24 from broadcast at DAGScheduler.scala:874
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 19 (MapPartitionsRDD[44] at show at <console>:36)
16/07/19 23:34:55 INFO scheduler.TaskSchedulerImpl: Adding task set 19.0 with 1 tasks
16/07/19 23:34:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 19.0 (TID 417, localhost, ANY, 1413 bytes)
16/07/19 23:34:55 INFO executor.Executor: Running task 0.0 in stage 19.0 (TID 417)
16/07/19 23:34:55 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:92+93
16/07/19 23:34:55 INFO executor.Executor: Finished task 0.0 in stage 19.0 (TID 417). 2311 bytes result sent to driver
16/07/19 23:34:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 19.0 (TID 417) in 14 ms on localhost (1/1)
16/07/19 23:34:55 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 19.0, whose tasks have all completed, from pool 
16/07/19 23:34:55 INFO scheduler.DAGScheduler: ResultStage 19 (show at <console>:36) finished in 0.013 s
16/07/19 23:34:55 INFO scheduler.DAGScheduler: Job 15 finished: show at <console>:36, took 0.026368 s
+-----------+---------------+------------+-----+--------+
|customer_id|           name|        city|state|zip_code|
+-----------+---------------+------------+-----+--------+
|        100|     John Smith|      Austin|   TX|   78727|
|        200|    Joe Johnson|      Dallas|   TX|   75201|
|        300|      Bob Jones|     Houston|   TX|   77028|
|        400|     Andy Davis| San Antonio|   TX|   78227|
|        500| James Williams|      Austin|   TX|   78727|
+-----------+---------------+------------+-----+--------+


scala> dfCustomers.printSchema();
root
 |-- customer_id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- zip_code: string (nullable = true)


scala> dfCustomers.select("name").show();
16/07/19 23:35:11 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Got job 16 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Final stage: ResultStage 20(show at <console>:36)
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Submitting ResultStage 20 (MapPartitionsRDD[46] at show at <console>:36), which has no missing parents
16/07/19 23:35:11 INFO storage.MemoryStore: ensureFreeSpace(5472) called with curMem=1152375, maxMem=278019440
16/07/19 23:35:11 INFO storage.MemoryStore: Block broadcast_25 stored as values in memory (estimated size 5.3 KB, free 264.0 MB)
16/07/19 23:35:11 INFO storage.MemoryStore: ensureFreeSpace(2881) called with curMem=1157847, maxMem=278019440
16/07/19 23:35:11 INFO storage.MemoryStore: Block broadcast_25_piece0 stored as bytes in memory (estimated size 2.8 KB, free 264.0 MB)
16/07/19 23:35:11 INFO storage.BlockManagerInfo: Added broadcast_25_piece0 in memory on localhost:56137 (size: 2.8 KB, free: 265.0 MB)
16/07/19 23:35:11 INFO spark.SparkContext: Created broadcast 25 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 20 (MapPartitionsRDD[46] at show at <console>:36)
16/07/19 23:35:11 INFO scheduler.TaskSchedulerImpl: Adding task set 20.0 with 1 tasks
16/07/19 23:35:11 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 20.0 (TID 418, localhost, ANY, 1413 bytes)
16/07/19 23:35:11 INFO executor.Executor: Running task 0.0 in stage 20.0 (TID 418)
16/07/19 23:35:11 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:0+92
16/07/19 23:35:11 INFO executor.Executor: Finished task 0.0 in stage 20.0 (TID 418). 2130 bytes result sent to driver
16/07/19 23:35:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 20.0 (TID 418) in 45 ms on localhost (1/1)
16/07/19 23:35:11 INFO scheduler.DAGScheduler: ResultStage 20 (show at <console>:36) finished in 0.045 s
16/07/19 23:35:11 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks have all completed, from pool 
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Job 16 finished: show at <console>:36, took 0.064127 s
16/07/19 23:35:11 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Got job 17 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Final stage: ResultStage 21(show at <console>:36)
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Submitting ResultStage 21 (MapPartitionsRDD[46] at show at <console>:36), which has no missing parents
16/07/19 23:35:11 INFO storage.MemoryStore: ensureFreeSpace(5472) called with curMem=1160728, maxMem=278019440
16/07/19 23:35:11 INFO storage.MemoryStore: Block broadcast_26 stored as values in memory (estimated size 5.3 KB, free 264.0 MB)
16/07/19 23:35:11 INFO storage.MemoryStore: ensureFreeSpace(2881) called with curMem=1166200, maxMem=278019440
16/07/19 23:35:11 INFO storage.MemoryStore: Block broadcast_26_piece0 stored as bytes in memory (estimated size 2.8 KB, free 264.0 MB)
16/07/19 23:35:11 INFO storage.BlockManagerInfo: Added broadcast_26_piece0 in memory on localhost:56137 (size: 2.8 KB, free: 265.0 MB)
16/07/19 23:35:11 INFO spark.SparkContext: Created broadcast 26 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:11 INFO storage.BlockManagerInfo: Removed broadcast_25_piece0 on localhost:56137 in memory (size: 2.8 KB, free: 265.0 MB)
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 21 (MapPartitionsRDD[46] at show at <console>:36)
16/07/19 23:35:11 INFO scheduler.TaskSchedulerImpl: Adding task set 21.0 with 1 tasks
16/07/19 23:35:11 INFO storage.BlockManagerInfo: Removed broadcast_24_piece0 on localhost:56137 in memory (size: 2.2 KB, free: 265.0 MB)
16/07/19 23:35:11 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 21.0 (TID 419, localhost, ANY, 1413 bytes)
16/07/19 23:35:11 INFO executor.Executor: Running task 0.0 in stage 21.0 (TID 419)
16/07/19 23:35:11 INFO storage.BlockManagerInfo: Removed broadcast_23_piece0 on localhost:56137 in memory (size: 2.2 KB, free: 265.0 MB)
16/07/19 23:35:11 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:92+93
16/07/19 23:35:11 INFO executor.Executor: Finished task 0.0 in stage 21.0 (TID 419). 2091 bytes result sent to driver
16/07/19 23:35:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 21.0 (TID 419) in 18 ms on localhost (1/1)
16/07/19 23:35:11 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 21.0, whose tasks have all completed, from pool 
16/07/19 23:35:11 INFO scheduler.DAGScheduler: ResultStage 21 (show at <console>:36) finished in 0.019 s
16/07/19 23:35:11 INFO scheduler.DAGScheduler: Job 17 finished: show at <console>:36, took 0.060116 s
+---------------+
|           name|
+---------------+
|     John Smith|
|    Joe Johnson|
|      Bob Jones|
|     Andy Davis|
| James Williams|
+---------------+


scala> dfCustomers.select("name", "city").show()
16/07/19 23:35:19 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Got job 18 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Final stage: ResultStage 22(show at <console>:36)
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Submitting ResultStage 22 (MapPartitionsRDD[48] at show at <console>:36), which has no missing parents
16/07/19 23:35:19 INFO storage.MemoryStore: ensureFreeSpace(5480) called with curMem=1148098, maxMem=278019440
16/07/19 23:35:19 INFO storage.MemoryStore: Block broadcast_27 stored as values in memory (estimated size 5.4 KB, free 264.0 MB)
16/07/19 23:35:19 INFO storage.MemoryStore: ensureFreeSpace(2883) called with curMem=1153578, maxMem=278019440
16/07/19 23:35:19 INFO storage.MemoryStore: Block broadcast_27_piece0 stored as bytes in memory (estimated size 2.8 KB, free 264.0 MB)
16/07/19 23:35:19 INFO storage.BlockManagerInfo: Added broadcast_27_piece0 in memory on localhost:56137 (size: 2.8 KB, free: 265.0 MB)
16/07/19 23:35:19 INFO spark.SparkContext: Created broadcast 27 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 22 (MapPartitionsRDD[48] at show at <console>:36)
16/07/19 23:35:19 INFO scheduler.TaskSchedulerImpl: Adding task set 22.0 with 1 tasks
16/07/19 23:35:19 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 22.0 (TID 420, localhost, ANY, 1413 bytes)
16/07/19 23:35:19 INFO executor.Executor: Running task 0.0 in stage 22.0 (TID 420)
16/07/19 23:35:19 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:0+92
16/07/19 23:35:19 INFO executor.Executor: Finished task 0.0 in stage 22.0 (TID 420). 2200 bytes result sent to driver
16/07/19 23:35:19 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 22.0 (TID 420) in 17 ms on localhost (1/1)
16/07/19 23:35:19 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 22.0, whose tasks have all completed, from pool 
16/07/19 23:35:19 INFO scheduler.DAGScheduler: ResultStage 22 (show at <console>:36) finished in 0.013 s
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Job 18 finished: show at <console>:36, took 0.030147 s
16/07/19 23:35:19 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Got job 19 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Final stage: ResultStage 23(show at <console>:36)
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Submitting ResultStage 23 (MapPartitionsRDD[48] at show at <console>:36), which has no missing parents
16/07/19 23:35:19 INFO storage.MemoryStore: ensureFreeSpace(5480) called with curMem=1156461, maxMem=278019440
16/07/19 23:35:19 INFO storage.MemoryStore: Block broadcast_28 stored as values in memory (estimated size 5.4 KB, free 264.0 MB)
16/07/19 23:35:19 INFO storage.MemoryStore: ensureFreeSpace(2883) called with curMem=1161941, maxMem=278019440
16/07/19 23:35:19 INFO storage.MemoryStore: Block broadcast_28_piece0 stored as bytes in memory (estimated size 2.8 KB, free 264.0 MB)
16/07/19 23:35:19 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in memory on localhost:56137 (size: 2.8 KB, free: 265.0 MB)
16/07/19 23:35:19 INFO spark.SparkContext: Created broadcast 28 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 23 (MapPartitionsRDD[48] at show at <console>:36)
16/07/19 23:35:19 INFO scheduler.TaskSchedulerImpl: Adding task set 23.0 with 1 tasks
16/07/19 23:35:19 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 23.0 (TID 421, localhost, ANY, 1413 bytes)
16/07/19 23:35:19 INFO executor.Executor: Running task 0.0 in stage 23.0 (TID 421)
16/07/19 23:35:19 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:92+93
16/07/19 23:35:19 INFO executor.Executor: Finished task 0.0 in stage 23.0 (TID 421). 2142 bytes result sent to driver
16/07/19 23:35:19 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 23.0 (TID 421) in 14 ms on localhost (1/1)
16/07/19 23:35:19 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 23.0, whose tasks have all completed, from pool 
16/07/19 23:35:19 INFO scheduler.DAGScheduler: ResultStage 23 (show at <console>:36) finished in 0.014 s
16/07/19 23:35:19 INFO scheduler.DAGScheduler: Job 19 finished: show at <console>:36, took 0.027027 s
+---------------+------------+
|           name|        city|
+---------------+------------+
|     John Smith|      Austin|
|    Joe Johnson|      Dallas|
|      Bob Jones|     Houston|
|     Andy Davis| San Antonio|
| James Williams|      Austin|
+---------------+------------+


scala> dfCustomers.filter(dfCustomers("customer_id").equalTo(500)).show();
16/07/19 23:35:41 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Got job 20 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Final stage: ResultStage 24(show at <console>:36)
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Submitting ResultStage 24 (MapPartitionsRDD[50] at show at <console>:36), which has no missing parents
16/07/19 23:35:41 INFO storage.MemoryStore: ensureFreeSpace(5632) called with curMem=1164824, maxMem=278019440
16/07/19 23:35:41 INFO storage.MemoryStore: Block broadcast_29 stored as values in memory (estimated size 5.5 KB, free 264.0 MB)
16/07/19 23:35:41 INFO storage.MemoryStore: ensureFreeSpace(2924) called with curMem=1170456, maxMem=278019440
16/07/19 23:35:41 INFO storage.MemoryStore: Block broadcast_29_piece0 stored as bytes in memory (estimated size 2.9 KB, free 264.0 MB)
16/07/19 23:35:41 INFO storage.BlockManagerInfo: Added broadcast_29_piece0 in memory on localhost:56137 (size: 2.9 KB, free: 265.0 MB)
16/07/19 23:35:41 INFO spark.SparkContext: Created broadcast 29 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 24 (MapPartitionsRDD[50] at show at <console>:36)
16/07/19 23:35:41 INFO scheduler.TaskSchedulerImpl: Adding task set 24.0 with 1 tasks
16/07/19 23:35:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 24.0 (TID 422, localhost, ANY, 1413 bytes)
16/07/19 23:35:41 INFO executor.Executor: Running task 0.0 in stage 24.0 (TID 422)
16/07/19 23:35:41 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:0+92
16/07/19 23:35:41 INFO executor.Executor: Finished task 0.0 in stage 24.0 (TID 422). 1800 bytes result sent to driver
16/07/19 23:35:41 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 24.0 (TID 422) in 41 ms on localhost (1/1)
16/07/19 23:35:41 INFO scheduler.DAGScheduler: ResultStage 24 (show at <console>:36) finished in 0.040 s
16/07/19 23:35:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 24.0, whose tasks have all completed, from pool 
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Job 20 finished: show at <console>:36, took 0.057741 s
16/07/19 23:35:41 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Got job 21 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Final stage: ResultStage 25(show at <console>:36)
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Submitting ResultStage 25 (MapPartitionsRDD[50] at show at <console>:36), which has no missing parents
16/07/19 23:35:41 INFO storage.MemoryStore: ensureFreeSpace(5632) called with curMem=1173380, maxMem=278019440
16/07/19 23:35:41 INFO storage.MemoryStore: Block broadcast_30 stored as values in memory (estimated size 5.5 KB, free 264.0 MB)
16/07/19 23:35:41 INFO storage.MemoryStore: ensureFreeSpace(2924) called with curMem=1179012, maxMem=278019440
16/07/19 23:35:41 INFO storage.MemoryStore: Block broadcast_30_piece0 stored as bytes in memory (estimated size 2.9 KB, free 264.0 MB)
16/07/19 23:35:41 INFO storage.BlockManagerInfo: Added broadcast_30_piece0 in memory on localhost:56137 (size: 2.9 KB, free: 265.0 MB)
16/07/19 23:35:41 INFO spark.SparkContext: Created broadcast 30 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 25 (MapPartitionsRDD[50] at show at <console>:36)
16/07/19 23:35:41 INFO scheduler.TaskSchedulerImpl: Adding task set 25.0 with 1 tasks
16/07/19 23:35:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 25.0 (TID 423, localhost, ANY, 1413 bytes)
16/07/19 23:35:41 INFO executor.Executor: Running task 0.0 in stage 25.0 (TID 423)
16/07/19 23:35:41 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:92+93
16/07/19 23:35:41 INFO executor.Executor: Finished task 0.0 in stage 25.0 (TID 423). 2189 bytes result sent to driver
16/07/19 23:35:41 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 25.0 (TID 423) in 15 ms on localhost (1/1)
16/07/19 23:35:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 25.0, whose tasks have all completed, from pool 
16/07/19 23:35:41 INFO scheduler.DAGScheduler: ResultStage 25 (show at <console>:36) finished in 0.015 s
16/07/19 23:35:41 INFO scheduler.DAGScheduler: Job 21 finished: show at <console>:36, took 0.038441 s
+-----------+---------------+-------+-----+--------+
|customer_id|           name|   city|state|zip_code|
+-----------+---------------+-------+-----+--------+
|        500| James Williams| Austin|   TX|   78727|
+-----------+---------------+-------+-----+--------+


scala> dfCustomers.groupBy("zip_code").count().show();
16/07/19 23:35:52 INFO execution.Exchange: Using SparkSqlSerializer2.
16/07/19 23:35:52 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Registering RDD 53 (show at <console>:36)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Got job 22 (show at <console>:36) with 1 output partitions (allowLocal=false)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Final stage: ResultStage 27(show at <console>:36)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 26)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 26)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 26 (MapPartitionsRDD[53] at show at <console>:36), which has no missing parents
16/07/19 23:35:52 INFO storage.MemoryStore: ensureFreeSpace(9456) called with curMem=1181936, maxMem=278019440
16/07/19 23:35:52 INFO storage.MemoryStore: Block broadcast_31 stored as values in memory (estimated size 9.2 KB, free 264.0 MB)
16/07/19 23:35:52 INFO storage.MemoryStore: ensureFreeSpace(4565) called with curMem=1191392, maxMem=278019440
16/07/19 23:35:52 INFO storage.MemoryStore: Block broadcast_31_piece0 stored as bytes in memory (estimated size 4.5 KB, free 264.0 MB)
16/07/19 23:35:52 INFO storage.BlockManagerInfo: Removed broadcast_29_piece0 on localhost:56137 in memory (size: 2.9 KB, free: 265.0 MB)
16/07/19 23:35:52 INFO storage.BlockManagerInfo: Added broadcast_31_piece0 in memory on localhost:56137 (size: 4.5 KB, free: 265.0 MB)
16/07/19 23:35:52 INFO storage.BlockManagerInfo: Removed broadcast_28_piece0 on localhost:56137 in memory (size: 2.8 KB, free: 265.0 MB)
16/07/19 23:35:52 INFO spark.SparkContext: Created broadcast 31 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:52 INFO storage.BlockManagerInfo: Removed broadcast_27_piece0 on localhost:56137 in memory (size: 2.8 KB, free: 265.0 MB)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 26 (MapPartitionsRDD[53] at show at <console>:36)
16/07/19 23:35:52 INFO scheduler.TaskSchedulerImpl: Adding task set 26.0 with 2 tasks
16/07/19 23:35:52 INFO storage.BlockManagerInfo: Removed broadcast_26_piece0 on localhost:56137 in memory (size: 2.8 KB, free: 265.0 MB)
16/07/19 23:35:52 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 26.0 (TID 424, localhost, ANY, 1402 bytes)
16/07/19 23:35:52 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 26.0 (TID 425, localhost, ANY, 1402 bytes)
16/07/19 23:35:52 INFO executor.Executor: Running task 0.0 in stage 26.0 (TID 424)
16/07/19 23:35:52 INFO storage.BlockManagerInfo: Removed broadcast_30_piece0 on localhost:56137 in memory (size: 2.9 KB, free: 265.0 MB)
16/07/19 23:35:52 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:0+92
16/07/19 23:35:52 INFO executor.Executor: Running task 1.0 in stage 26.0 (TID 425)
16/07/19 23:35:52 INFO rdd.HadoopRDD: Input split: hdfs://cdh1:9000/user/test/customers.txt:92+93
16/07/19 23:35:52 INFO executor.Executor: Finished task 0.0 in stage 26.0 (TID 424). 2203 bytes result sent to driver
16/07/19 23:35:52 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 26.0 (TID 424) in 175 ms on localhost (1/2)
16/07/19 23:35:52 INFO executor.Executor: Finished task 1.0 in stage 26.0 (TID 425). 2203 bytes result sent to driver
16/07/19 23:35:52 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 26.0 (TID 425) in 181 ms on localhost (2/2)
16/07/19 23:35:52 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 26.0, whose tasks have all completed, from pool 
16/07/19 23:35:52 INFO scheduler.DAGScheduler: ShuffleMapStage 26 (show at <console>:36) finished in 0.180 s
16/07/19 23:35:52 INFO scheduler.DAGScheduler: looking for newly runnable stages
16/07/19 23:35:52 INFO scheduler.DAGScheduler: running: Set()
16/07/19 23:35:52 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 27)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: failed: Set()
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Missing parents for ResultStage 27: List()
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Submitting ResultStage 27 (MapPartitionsRDD[57] at show at <console>:36), which is now runnable
16/07/19 23:35:52 INFO storage.MemoryStore: ensureFreeSpace(10280) called with curMem=1153766, maxMem=278019440
16/07/19 23:35:52 INFO storage.MemoryStore: Block broadcast_32 stored as values in memory (estimated size 10.0 KB, free 264.0 MB)
16/07/19 23:35:52 INFO storage.MemoryStore: ensureFreeSpace(4981) called with curMem=1164046, maxMem=278019440
16/07/19 23:35:52 INFO storage.MemoryStore: Block broadcast_32_piece0 stored as bytes in memory (estimated size 4.9 KB, free 264.0 MB)
16/07/19 23:35:52 INFO storage.BlockManagerInfo: Added broadcast_32_piece0 in memory on localhost:56137 (size: 4.9 KB, free: 265.0 MB)
16/07/19 23:35:52 INFO spark.SparkContext: Created broadcast 32 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 27 (MapPartitionsRDD[57] at show at <console>:36)
16/07/19 23:35:52 INFO scheduler.TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
16/07/19 23:35:52 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 27.0 (TID 426, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/19 23:35:52 INFO executor.Executor: Running task 0.0 in stage 27.0 (TID 426)
16/07/19 23:35:52 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
16/07/19 23:35:52 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/07/19 23:35:52 INFO executor.Executor: Finished task 0.0 in stage 27.0 (TID 426). 894 bytes result sent to driver
16/07/19 23:35:52 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 27.0 (TID 426) in 7 ms on localhost (1/1)
16/07/19 23:35:52 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 27.0, whose tasks have all completed, from pool 
16/07/19 23:35:52 INFO scheduler.DAGScheduler: ResultStage 27 (show at <console>:36) finished in 0.007 s
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Job 22 finished: show at <console>:36, took 0.261327 s
16/07/19 23:35:52 INFO spark.SparkContext: Starting job: show at <console>:36
16/07/19 23:35:52 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 2 is 169 bytes
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Got job 23 (show at <console>:36) with 199 output partitions (allowLocal=false)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Final stage: ResultStage 29(show at <console>:36)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 28)
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/19 23:35:52 INFO scheduler.DAGScheduler: Submitting ResultStage 29 (MapPartitionsRDD[57] at show at <console>:36), which has no missing parents
16/07/19 23:35:52 INFO storage.MemoryStore: ensureFreeSpace(10280) called with curMem=1169027, maxMem=278019440
16/07/19 23:35:52 INFO storage.MemoryStore: Block broadcast_33 stored as values in memory (estimated size 10.0 KB, free 264.0 MB)
16/07/19 23:35:52 INFO storage.MemoryStore: ensureFreeSpace(4981) called with curMem=1179307, maxMem=278019440
16/07/19 23:35:52 INFO storage.MemoryStore: Block broadcast_33_piece0 stored as bytes in memory (estimated size 4.9 KB, free 264.0 MB)
16/07/19 23:35:52 INFO storage.BlockManagerInfo: Added broadcast_33_piece0 in memory on localhost:56137 (size: 4.9 KB, free: 265.0 MB)
16/07/19 23:35:53 INFO spark.SparkContext: Created broadcast 33 from broadcast at DAGScheduler.scala:874
16/07/19 23:35:53 INFO scheduler.DAGScheduler: Submitting 199 missing tasks from ResultStage 29 (MapPartitionsRDD[57] at show at <console>:36)
16/07/19 23:35:53 INFO scheduler.TaskSchedulerImpl: Adding task set 29.0 with 199 tasks
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 29.0 (TID 427, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 29.0 (TID 428, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/19 23:35:53 INFO executor.Executor: Running task 0.0 in stage 29.0 (TID 427)
16/07/19 23:35:53 INFO executor.Executor: Running task 1.0 in stage 29.0 (TID 428)
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/07/19 23:35:53 INFO executor.Executor: Finished task 1.0 in stage 29.0 (TID 428). 894 bytes result sent to driver
16/07/19 23:35:53 INFO executor.Executor: Finished task 0.0 in stage 29.0 (TID 427). 894 bytes result sent to driver
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 29.0 (TID 429, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/19 23:35:53 INFO executor.Executor: Running task 2.0 in stage 29.0 (TID 429)
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 29.0 (TID 430, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 29.0 (TID 427) in 11 ms on localhost (1/199)
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 29.0 (TID 428) in 10 ms on localhost (2/199)
16/07/19 23:35:53 INFO executor.Executor: Running task 3.0 in stage 29.0 (TID 430)
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/07/19 23:35:53 INFO executor.Executor: Finished task 3.0 in stage 29.0 (TID 430). 894 bytes result sent to driver
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 29.0 (TID 431, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/19 23:35:53 INFO executor.Executor: Running task 4.0 in stage 29.0 (TID 431)
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 29.0 (TID 430) in 8 ms on localhost (3/199)
16/07/19 23:35:53 INFO executor.Executor: Finished task 2.0 in stage 29.0 (TID 429). 894 bytes result sent to driver
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 29.0 (TID 432, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 29.0 (TID 429) in 12 ms on localhost (4/199)
16/07/19 23:35:53 INFO executor.Executor: Running task 5.0 in stage 29.0 (TID 432)
16/07/19 23:35:53 INFO executor.Executor: Finished task 4.0 in stage 29.0 (TID 431). 894 bytes result sent to driver
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 29.0 (TID 433, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 29.0 (TID 431) in 6 ms on localhost (5/199)
16/07/19 23:35:53 INFO executor.Executor: Running task 6.0 in stage 29.0 (TID 433)
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
16/07/19 23:35:53 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/07/19 23:35:53 INFO executor.Executor: Finished task 6.0 in stage 29.0 (TID 433). 894 bytes result sent to driver
16/07/19 23:35:53 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 29.0 (TID 434, localhost, PROCESS_LOCAL