1. 程式人生 > >Spark運算元:Action之saveAsTextFile、saveAsSequenceFile、saveAsObjectFile

Spark運算元:Action之saveAsTextFile、saveAsSequenceFile、saveAsObjectFile

1、saveAsTextFile

1)def saveAsTextFile(path: String): Unit 2)def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit saveAsTextFile用於將RDD以文字檔案的格式儲存到檔案系統中。

var rdd1 = sc.makeRDD(1 to 10,2)
scala> rdd1.saveAsTextFile("hdfs://cdh5/tmp/lxw1234.com/") //儲存到HDFS
hadoop fs -ls /tmp/lxw1234.com
Found 2 items
-rw-r--r--   2 lxw1234 supergroup        0 2015-07-10 09:15 /tmp/lxw1234.com/_SUCCESS
-rw-r--r--   2 lxw1234 supergroup        21 2015-07-10 09:15 /tmp/lxw1234.com/part-00000
 
hadoop fs -cat /tmp/lxw1234.com/part-00000
1
2
3
4
5

//指定壓縮格式儲存

rdd1.saveAsTextFile("hdfs://cdh5/tmp/lxw1234.com/",classOf[com.hadoop.compression.lzo.LzopCodec])
 
hadoop fs -ls /tmp/lxw1234.com
-rw-r--r--   2 lxw1234 supergroup    0 2015-07-10 09:20 /tmp/lxw1234.com/_SUCCESS
-rw-r--r--   2 lxw1234 supergroup    71 2015-07-10 09:20 /tmp/lxw1234.com/part-00000.lzo
 
hadoop fs -text /tmp/lxw1234.com/part-00000.lzo
1
2
3
4
5

2、saveAsSequenceFile

saveAsSequenceFile用於將RDD以SequenceFile的檔案格式儲存到HDFS上,用法同saveAsTextFile

3、saveAsObjectFile : def saveAsObjectFile(path: String): Unit

saveAsObjectFile用於將RDD中的元素序列化成物件,儲存到檔案中。對於HDFS,預設採用SequenceFile儲存。

var rdd1 = sc.makeRDD(1 to 10,2)
rdd1.saveAsObjectFile("hdfs://cdh5/tmp/lxw1234.com/")
 
hadoop fs -cat /tmp/lxw1234.com/part-00000
SEQ !org.apache.hadoop.io.NullWritable"org.apache.hadoop.io.BytesWritableT