1. 程式人生 > >Spark將資料壓縮儲存

Spark將資料壓縮儲存

/tmp/dj/20170622.1498060818603為json資料
將資料壓縮儲存成parquet

val logs = spark.read.json("/tmp/dj/20170622.1498060818603")
//logs.coalesce(2).write.option("compression","gzip").json("/tmp/dj/json2")
logs.coalesce(2).write.parquet("/tmp/dj/parquet2")

讀取parquet檔案

val logs1 = spark.read.parquet("/tmp/dj/parquet2/*")
//now logs1 is DataFrame with
some fields of previous json field