1. 程式人生 > >【Spark核心原始碼】Spark原始碼環境搭建

【Spark核心原始碼】Spark原始碼環境搭建

目錄

準備條件

下載spark原始碼,並解壓

開啟spark原始碼下的pom.xml檔案,修改對應的java和intellij裡的maven版本

開啟intellij,Inport Project,將原始碼匯入intellij中

問題總結(十分重要)

Maven編譯打包前的準備

Maven編譯打包


準備條件

1、下載安裝intellij
2、下載安裝jdk1.8
3、下載安裝scala2.11.8
4、下載安裝maven3.5.3(本人使用的版本,不低於3.3.9就ok)
5、下載安裝Git(可省略)
6、下載安裝spark-2.1.0-bin-hadoop2.7,配置環境變數(同Linux環境下)
7、下載安裝hadoop-2.7.4,配置環境變數(同Linux環境下)

下載spark原始碼,並解壓

開啟spark原始碼下的pom.xml檔案,修改對應的java和intellij裡的maven版本

開啟intellij,Inport Project,將原始碼匯入intellij中

1、匯入

2、選擇“Import project form external model”並且選擇Maven,點選【Next】

3、勾選這三個框框,點選【Next】

需要載入一會

4、新增yarn和hadoop2.7選項,其他保持預設,hadoop的版本要與你之前安裝的環境相對應,如果你是採用2.4版本的,那麼就選擇2.4版本的。點選【Next】

5、點選【Finish】

這個時候開始瘋狂地下載Spark所需要的資源

等全部資源下載完成就OK了。

問題總結(十分重要)

問題1:not found:type SparkFlumeProtocol

flume-sink所需要的資源IDEA沒有自動下載

解決方案:

在Maven Projects欄中找到Spark Project External Flume Sink

右擊【Spark Project External Flume Sink】

點選【Generate Sources and Update Folders】,這個過程可能會因為一些不可知的原因導致無效,需要多嘗試幾遍,我第一遍成功之後,我清理了一下maven庫中的垃圾檔案,重新Generate Sources and Update Folders,然後還是沒有成功,嘗試第三遍的時候就好了。

IDEA會重新下載Flume Sink相關資源,重新build

成功之後,org.apache.spark.streaming.flume.sink.SparkAvroCallbackHandler這個類繼承的SparkFlumeProtocol就不會報錯了。

問題2:Error:(34, 45) object SqlBaseParser is not a member of package org.apache.spark.sql.catalyst.parser
import org.apache.spark.sql.catalyst.parser.SqlBaseParser._

SqlBaseParser不在org.apache.spark.sql.catalyst.parser這個包裡,這麼看的確不在這個包中,

解決方案:

spark-catalyst工程中,執行“mvn clean compile”,重新編譯spark-catalyst

進入Project Structure,執行以下步驟,注意第三步,目錄層級別選擇錯了

成功之後org.apache.spark.sql.catalyst.parser.AstBuilder中引用的org.apache.spark.sql.catalyst.parser.SqlBaseParser._就不會報錯了

 我們在spark-2.1.0\sql\catalyst\target\generated-sources\antlr4\org\apache\spark\sql\catalyst\parser中找到了SqlBaseParser

問題3:Error:(52, 75) not found: value TCLIService public abstract class ThriftCLIService extends AbstractService implements TCLIService.Iface, Runnable

解決方案:

進入Project Structure,執行以下步驟,注意第三步,目錄層級別選擇錯了

成功之後org.apache.hive.service.cli.thrift.ThriftCLIService繼承的org.apache.hive.service.cli.thrift.TCLIService就不會報錯了

到這裡,我們的spark原始碼已經不會報錯了。

Maven編譯打包前的準備

1、檢查Maven

在Setting中,檢查並設定maven版本,在最開始時我們修改過pom.xml檔案,修改了maven版本和jdk版本,maven版本要與之相對應。

2、檢查scala包

在Project Structure中修改

3、設定maven

-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m

Maven編譯打包

在編譯的最後,出現了一個錯誤

[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (default) on project spark-core_2.11: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "bash" (in directory "E:\spark-source\spark-2.1.0\core"): CreateProcess error=2, 系統找不到指定的檔案。
[ERROR] around Ant part ...<exec executable="bash">... @ 4:27 in E:\spark-source\spark-2.1.0\core\target\antrun\build-main.xml
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :spark-core_2.11
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0

問題出在“Execute failed: java.io.IOException: Cannot run program "bash" (in directory "E:\spark-source\spark-2.1.0\core"): CreateProcess error=2, 系統找不到指定的檔案。”這裡。

解決辦法是安裝Git,將Git下面的bin路徑加入系統Path中

解決了這個問題,重新編譯

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM 2.1.0 ..................... SUCCESS [ 11.060 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 11.773 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 18.642 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 21.922 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 15.266 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 25.778 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 23.044 s]
[INFO] Spark Project Core ................................. SUCCESS [05:40 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:01 min]
[INFO] Spark Project GraphX ............................... SUCCESS [01:28 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:24 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [04:26 min]
[INFO] Spark Project SQL .................................. SUCCESS [06:14 min]
[INFO] Spark Project ML Library ........................... SUCCESS [05:16 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 17.478 s]
[INFO] Spark Project Hive ................................. SUCCESS [03:44 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 48.452 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 14.111 s]
[INFO] Spark Project YARN ................................. SUCCESS [01:35 min]
[INFO] Spark Project Assembly ............................. SUCCESS [  6.867 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 40.589 s]
[INFO] Spark Project External Flume ....................... SUCCESS [01:02 min]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [  6.343 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [01:12 min]
[INFO] Spark Project Examples ............................. SUCCESS [01:40 min]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [  6.682 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:01 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  6.573 s]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [01:11 min]
[INFO] Spark Project Java 8 Tests 2.1.0 ................... SUCCESS [ 21.143 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 42:58 min
[INFO] Finished at: 2018-10-09T23:25:48+08:00
[INFO] ------------------------------------------------------------------------

編譯成功!!!

打好的jar包就在spark-2.1.0\assembly\target\scala-2.11\jars裡