1. 程式人生 > >spark rdd 和 DF 轉換

spark rdd 和 DF 轉換

RDD   -》 DF

有兩種方式

一、

一、Inferring the Schema Using Reflection

將 RDD[t]   轉為一個 object ,然後 to df

val peopleDF = spark.sparkContext
  .textFile("examples/src/main/resources/people.txt")
  .map(_.split(","))
  .map(attributes => Person(attributes(0), attributes(1).trim.toInt))
  .toDF()

rdd 也能直接裝 DATASet  要  import 隱式裝換 類 import spark.implicits._

 如果  轉換的物件為  tuple .   轉換後  下標為 _1  _2   .....

二、Programmatically Specifying the Schema

把 columnt meta  和  rdd   createDataFrame 在一起

val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")

// The schema is encoded in a string
val schemaString = "name age"

// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)
val rowRDD = peopleRDD
  .map(_.split(","))
  .map(attributes => Row(attributes(0), attributes(1).trim))

// Apply the schema to the RDD
val peopleDF = spark.createDataFrame(rowRDD, schema)

// Creates a temporary view using the DataFrame
peopleDF.createOrReplaceTempView("people")

DF  to  RDd

val tt = teenagersDF.rdd