【Spark核心原始碼】Spark原始碼環境搭建
目錄
開啟spark原始碼下的pom.xml檔案,修改對應的java和intellij裡的maven版本
開啟intellij,Inport Project,將原始碼匯入intellij中
準備條件
1、下載安裝intellij
2、下載安裝jdk1.8
3、下載安裝scala2.11.8
4、下載安裝maven3.5.3(本人使用的版本,不低於3.3.9就ok)
5、下載安裝Git(可省略)
6、下載安裝spark-2.1.0-bin-hadoop2.7,配置環境變數(同Linux環境下)
7、下載安裝hadoop-2.7.4,配置環境變數(同Linux環境下)
下載spark原始碼,並解壓
開啟spark原始碼下的pom.xml檔案,修改對應的java和intellij裡的maven版本
開啟intellij,Inport Project,將原始碼匯入intellij中
1、匯入
2、選擇“Import project form external model”並且選擇Maven,點選【Next】
3、勾選這三個框框,點選【Next】
需要載入一會
4、新增yarn和hadoop2.7選項,其他保持預設,hadoop的版本要與你之前安裝的環境相對應,如果你是採用2.4版本的,那麼就選擇2.4版本的。點選【Next】
5、點選【Finish】
這個時候開始瘋狂地下載Spark所需要的資源
等全部資源下載完成就OK了。
問題總結(十分重要)
問題1:not found:type SparkFlumeProtocol
flume-sink所需要的資源IDEA沒有自動下載
解決方案:
在Maven Projects欄中找到Spark Project External Flume Sink
右擊【Spark Project External Flume Sink】
點選【Generate Sources and Update Folders】,這個過程可能會因為一些不可知的原因導致無效,需要多嘗試幾遍,我第一遍成功之後,我清理了一下maven庫中的垃圾檔案,重新Generate Sources and Update Folders,然後還是沒有成功,嘗試第三遍的時候就好了。
IDEA會重新下載Flume Sink相關資源,重新build
成功之後,org.apache.spark.streaming.flume.sink.SparkAvroCallbackHandler這個類繼承的SparkFlumeProtocol就不會報錯了。
問題2:Error:(34, 45) object SqlBaseParser is not a member of package org.apache.spark.sql.catalyst.parser
import org.apache.spark.sql.catalyst.parser.SqlBaseParser._
SqlBaseParser不在org.apache.spark.sql.catalyst.parser這個包裡,這麼看的確不在這個包中,
解決方案:
spark-catalyst工程中,執行“mvn clean compile”,重新編譯spark-catalyst
進入Project Structure,執行以下步驟,注意第三步,目錄層級別選擇錯了
成功之後org.apache.spark.sql.catalyst.parser.AstBuilder中引用的org.apache.spark.sql.catalyst.parser.SqlBaseParser._就不會報錯了
我們在spark-2.1.0\sql\catalyst\target\generated-sources\antlr4\org\apache\spark\sql\catalyst\parser中找到了SqlBaseParser
問題3:Error:(52, 75) not found: value TCLIService public abstract class ThriftCLIService extends AbstractService implements TCLIService.Iface, Runnable
解決方案:
進入Project Structure,執行以下步驟,注意第三步,目錄層級別選擇錯了
成功之後org.apache.hive.service.cli.thrift.ThriftCLIService繼承的org.apache.hive.service.cli.thrift.TCLIService就不會報錯了
到這裡,我們的spark原始碼已經不會報錯了。
Maven編譯打包前的準備
1、檢查Maven
在Setting中,檢查並設定maven版本,在最開始時我們修改過pom.xml檔案,修改了maven版本和jdk版本,maven版本要與之相對應。
2、檢查scala包
在Project Structure中修改
3、設定maven
-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m
Maven編譯打包
在編譯的最後,出現了一個錯誤
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (default) on project spark-core_2.11: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "bash" (in directory "E:\spark-source\spark-2.1.0\core"): CreateProcess error=2, 系統找不到指定的檔案。
[ERROR] around Ant part ...<exec executable="bash">... @ 4:27 in E:\spark-source\spark-2.1.0\core\target\antrun\build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :spark-core_2.11
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
問題出在“Execute failed: java.io.IOException: Cannot run program "bash" (in directory "E:\spark-source\spark-2.1.0\core"): CreateProcess error=2, 系統找不到指定的檔案。”這裡。
解決辦法是安裝Git,將Git下面的bin路徑加入系統Path中
解決了這個問題,重新編譯
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.1.0 ..................... SUCCESS [ 11.060 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 11.773 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 18.642 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 21.922 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 15.266 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 25.778 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 23.044 s]
[INFO] Spark Project Core ................................. SUCCESS [05:40 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:01 min]
[INFO] Spark Project GraphX ............................... SUCCESS [01:28 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:24 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [04:26 min]
[INFO] Spark Project SQL .................................. SUCCESS [06:14 min]
[INFO] Spark Project ML Library ........................... SUCCESS [05:16 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 17.478 s]
[INFO] Spark Project Hive ................................. SUCCESS [03:44 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 48.452 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 14.111 s]
[INFO] Spark Project YARN ................................. SUCCESS [01:35 min]
[INFO] Spark Project Assembly ............................. SUCCESS [ 6.867 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 40.589 s]
[INFO] Spark Project External Flume ....................... SUCCESS [01:02 min]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 6.343 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [01:12 min]
[INFO] Spark Project Examples ............................. SUCCESS [01:40 min]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 6.682 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:01 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 6.573 s]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [01:11 min]
[INFO] Spark Project Java 8 Tests 2.1.0 ................... SUCCESS [ 21.143 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 42:58 min
[INFO] Finished at: 2018-10-09T23:25:48+08:00
[INFO] ------------------------------------------------------------------------
編譯成功!!!
打好的jar包就在spark-2.1.0\assembly\target\scala-2.11\jars裡