第3章 實戰環境搭建
3-1課程目錄
實戰環境搭建
Spark 原始碼編譯 Spark環境搭建 Spark 簡單使用
3-2 -Spark原始碼編譯
1、下載到官網(原始碼編譯版本)(http://spark.apache.org/downloads.html)
wget https://archive.apache.org/dist/spark/spark-2.1.0/spark-2.1.0.tgz
2、編譯步驟
http://spark.apache.org/docs/latest/building-spark.html
前置要求
1)The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.3.9 or newer and Java 8+. Note that support for Java 7 was removed as of Spark 2.2.0.
2) export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
mvn編譯命令
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
前提:需要對maven有一定了解
./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
spark原始碼編譯
mvn編譯 make-distribution.sh
3-3 補錄:Spark原始碼編譯中的坑
1、
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
建議:阿里雲的機器,可能記憶體不足,建議使用虛擬機器2-4G
3-4 Spark Local模式環境搭建
Spark環境搭建
Local模式
3-5 Spark Standalone模式環境搭建
Spark Standalone模式架構和hadoop 、HDFS/ YARN 和類似的
1 master+n worker
spark-env.sh
hadoop1:master
hadoop2:worker
hadoop3:worker
hadoop4:worker
3-6 Spark簡單使用
Spark簡單使用
使用Spark完成wordcount統計
參考文件 :
http://spark.apache.org/examples.html
Word Count
In this example, we use a few transformations to build a dataset of (String, Int) pairs called counts and then save it to a file.
val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")
程式碼:
val textFile = sc.textFile("file:///home/hadoop/data/wc.txt") val counts = textFile.flatMap(line => line.split(",")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.collect
在開發階段直接用Local模式