Spark作為最有可能代替mapreduce的分散式計算框架,當前非常火,本人也開始關注Spark並試著從hadoop+mahout轉向Spark。

1.本地環境

本地為ubuntu14.04+jdk1.7

2.原始碼下載

我所使用的版本是:0.9.1版本,原始碼包為:spark-0.9.1.tgz

3.編譯

解壓縮原始碼包

$tar xzvf spark-0.9.1.tgz

$cd spark-0.9.1

sbt/sbt assembly

編譯成功,你就可以玩兒啦!

4.執行例項

最簡單的計算pi

$./bin/run-example org.apache.spark.examples.SparkPi local[3]
local代表本地,[3]表示3個執行緒跑。

結果如下:

Pi is roughly 3.13486
14/05/08 10:26:15 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/static,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/metrics/json,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/executors,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/environment,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages/pool,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages/stage,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/storage,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/storage/rdd,null}
14/05/08 10:26:16 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/05/08 10:26:16 INFO network.ConnectionManager: Selector thread was interrupted!
14/05/08 10:26:16 INFO network.ConnectionManager: ConnectionManager stopped
14/05/08 10:26:16 INFO storage.MemoryStore: MemoryStore cleared
14/05/08 10:26:16 INFO storage.BlockManager: BlockManager stopped
14/05/08 10:26:16 INFO storage.BlockManagerMasterActor: Stopping BlockManagerMaster
14/05/08 10:26:16 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/05/08 10:26:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/05/08 10:26:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
14/05/08 10:26:16 INFO spark.SparkContext: Successfully stopped SparkContext
14/05/08 10:26:16 INFO Remoting: Remoting shut down
14/05/08 10:26:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
看到了吧PI=3.13486

5.互動方式執行

$ ./bin/spark-shell

進入互動模式。

互動模式例項可以看這裡,它是對READ.me檔案的處理。