spark-1.4.1-bin-cdh5.3.2 Maven編譯

阿新 • • 發佈：2019-01-01

Spark 編譯前準備

1. 下載 Spark1.4.1 原始碼包，並解壓

筆者解壓到

tar -zxvf spark-1.4.1.tgz -C /home/hadoop/softwares/

2. 安裝 Maven

安裝就是解壓，然後配置環境變數，沒啥了

export SCALA_HOME=/home/hadoop/softwares/scala-2.10.4
export PATH=${PATH}:$SCALA_HOME/bin

在 linux 安裝下 Scala 環境，鍵入 scala -version ，出現如下即可：

這裡寫圖片描述

4. 安裝 Oracle 的 JDK 7

雖然筆者使用 Open-jdk 1.7 編譯成功了，但是還是暫時推薦讀者使用 Oracle 的 JDK 7。jdk 1.7 下載及安裝，具體參考筆者的 JAVA 配置

注意：實際中，筆者沒像網上的人那樣，直接把 open-jdk 刪的不要不要的。我只是將 Oracle 的 jdk 的環境變數新增到原有系統變數 $PATH 的之前（路徑搜尋從前向後，搜尋到就停止啦~），具體如下：

export JAVA_HOME=/home/hadoop/softwares/jdk1.7.0_71
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME 
/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH

這裡寫圖片描述

編譯

進入 spark 1.4.1 原始碼目錄下，編譯之前的目錄結構：

這裡寫圖片描述

然後編譯：

mvn -Dhadoop.version=2.5.0-cdh5.3.2 -Pyarn -Phive -Phive-thriftserver -DskipTests clean package

但筆者希望將輸出結果不僅在螢幕上顯示，同時也希望儲存到文件中，於是命令為（筆者就用這個）：

mvn -Dhadoop.version=2.5.0-cdh5.3.2 -Pyarn -Phive -Phive 
-thriftserver -DskipTests clean package | tee building.txt

題外話：其實好像用 cdh 版本的只要寫以下編譯語句就可以了（筆者未考證）

mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package

注意的是 hadoop version 和 scala 的版本設定成對應的版本。

這裡寫圖片描述

Note:

Mvn 並不會預設生成 tar 包。你會得到很多 jar 檔案 —— 每一個工程下面都有它自己的 jar 包（例如上圖中的標註的）

ls /home/hadoop/softwares/spark-1.4.1/network/yarn/target

這裡寫圖片描述

在 assembly/target/scala-2.10 目錄下有個 spark-assembly-1.4.1-hadoop2.5.0-cdh5.3.2.jar 檔案

ls /home/hadoop/softwares/spark-1.4.1/assembly/target/scala-2.10

這裡寫圖片描述

筆者將其拖入 windows 下，用解壓工具開啟 see 了下：

這裡寫圖片描述

在 org 資料夾下：

這裡寫圖片描述

該資料夾下的檔案：

這裡寫圖片描述

這就說明了編譯成功了。

Make 生成二進位制 tgz 包（解壓可直接執行）

然後在原始碼目錄下面 make-distribution.sh ，可以用來打二進位制bin包：

Note：執行這個命令，筆者瞬間覺得自己SB了，不用 mvn ，好像直接 ./make-distribution 就 OK 了，因為 make 自帶 Maven 編譯。

./make-distribution.sh --name custom-spark --skip-java-test --tgz -Pyarn -Dhadoop.version=2.5.0-cdh5.3.2  -Dscala-2.10.4 -Phive -Phive-thriftserver

上述命令中 “–name custom-spark” 還有待商榷，貌似應該是 “hadoop-version”。

筆者所用命令為：

./make-distribution.sh --name cdh5.3.2 --skip-java-test --tgz -Pyarn -Dhadoop.version=2.5.0-cdh5.3.2  -Dscala-2.10.4 -Phive -Phive-thriftserver | tee building_distribution.txt

這裡寫圖片描述

最後，它提示（Y/N），筆者小心翼翼地選擇了 Y，然後就進入漫長的編譯階段…

最終經歷了種種困難後，終於成功編譯了，如下圖：

這裡寫圖片描述

然後在該目錄下：

這裡寫圖片描述

這個部署包 322 M 大小

在該目錄下，生成了 spark-1.4.1-bin-cdh5.3.2.tgz 檔案，322M 大小（後記：經初步檢測，能正常使用），到此，筆者編譯就告一段落了。

Q & A

Q1： warning: [options] bootstrap class path not set in conjunction with -source 1.6

原因：
This is not Ant but the JDK’s javac emitting the warning.

If you use Java 7’s javac and -source for anything smaller than 7 javac warns you you should also set the bootstrap classpath to point to an older rt.jar - because this is the only way to ensure the result is usable on an older VM.

This is only a warning, so you could ignore it and even suppress it with

<compilerarg value="-Xlint:-options"/>

Alternatively you really install an older JVM and adapt your bootclasspath accordingly (you need to include rt.jar, not the bin folder)

解決辦法：忽略不管唄~

Q2：編譯中斷失敗 (compile failed. CompileFailed)

Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-sql_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed. CompileFailed -> [Help 1]

Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:testCompile (scala-test-compile-first) on project spark-sql_2.10: Execution scala-test-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.0:testCompile failed. CompileFailed -> [Help 1]

這裡寫圖片描述

Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-core_2.10: Execution scala-compile-first of golchim31.maven:scala-maven-plugin:3.2.0:compile failed. CompileFailed -> [Help 1]

這裡寫圖片描述

[WARNING] The requested profile “hive-” could not be activated because it does not exist.
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-mllib_2.10: Exeoal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed. CompileFailed -> [Help 1]

這裡寫圖片描述

原因：

網速問題？
時間太長了，超出編譯的最大時間
編譯主機負荷大？

解決辦法：

刪除本地 Maven 倉庫，然後多次重新編譯
要麼 mvn <goals> -rf :spark-sql_2.10 // 從失敗的地方(比如 spark-sql_2.10 )開始編譯

./make-distribution.sh --name cdh5.3.2 --skip-java-test --tgz -Pyarn -Dhadoop.version=2.5.0-cdh5.3.2  -Dscala-2.10.4 -Phive -Phive-thriftserver -rf :spark-sql_2.10

修改spark1.4.1原始碼下的 pom.xml 檔案

<dependency>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.2.0</version>
</dependency>

這裡寫圖片描述

Q3： spark-repl_2.10 的 MissingRequirementError

[ERROR] error while loading , error in opening zip file
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-repl_2.10: wrap: scala.reflect.internal.MissingRequirementError: object scala.runtime in compiler mirror not found. -> [Help 1]

這裡寫圖片描述

org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile-first) on project spark-repl_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed.

這裡寫圖片描述

Google 到的困難原因：

回答一
This error is actually an error from scalac, not a compile error from the code. It sort of sounds like it has not been able to download scala dependencies. Check or maybe recreate your environment.
回答二
This error is very misleading, it actually has nothing to do with scala.runtime or the compiler mirror: this is the error you get when you have a faulty JAR file on your classpath.
Sadly, there is no way from the error (even with -Ydebug) to tell exactly which file. You can run scala with -Ylog-classpath, it will output a lot of classpath stuff, including the exact classpath used (look for “[init] [search path for class files:”). Then I guess you will have to go through them to check if they are valid or not.
I recently tried to improve that (SI-5463), at least to get a clear error message, but couldn’t find a satisfyingly clean way to do this…
回答三
I have checked to ensure that in my class path that ALL jars from SCALA_HOME/lib/ are included
As we figured out at #scala, the documentation was missing the fact that one needs to provide the -Dscala.usejavacp=true argument to the JVM command that invokes scalac. After that everything worked fine, and I updated the docs: http://docs.scala-lang.org/overviews/macros/overview.html#debugging_macros.

Q4：其他潛在的問題

為了防止Spark(1.4.1)與Hadoop(2.5.0)所使用的Protocol Buffers版本不一致會造成不能正確讀取HDFS檔案, 所以需要對pom.xml進行相應修改。

    <!--<protobuf.version>2.4.1</protobuf.version>-->
    <protobuf.version>2.5.0</protobuf.version>

重要的參考資料

《spark1.4.0基於yarn的安裝心得體會》：http://blog.csdn.net/xiao_jun_0820/article/details/46561097
目前線上用的是cdh5.3.2中內嵌的spark1.2.0版本，該版本BUG還是蠻多的，尤其是一些spark sql的BUG，簡直不能忍。spark1.4.0新出的支援SparkR，其他用R的同時很期待試用該版本看看sparkR好不好用，於是乎打算升級一下spark的版本。
《CDH5.1.0編譯spark-assembly包來支援hive 》：http://blog.csdn.net/aaa1117a8w5s6d/article/details/44307207
maven的配置檔案apache-maven-3.2.5/conf/settings.xml 增加私服地址，同時提供測試程式碼
- Exception in thread “main” java.lang.OutOfMemoryError
- Cannot run program “javac”: java.io.IOException
- Please set the SCALA_HOME
- 選擇相應的Hadoop和Yarn版本

spark-1.4.1-bin-cdh5.3.2 Maven編譯

Spark 編譯前準備

編譯

Make 生成二進位制 tgz 包（解壓可直接執行）

Q & A

重要的參考資料

spark-1.4.1-bin-cdh5.3.2 Maven編譯

用for和while循環求e的值[e=1+1/1!+1/2!+1/3!+1/4!+1/5!+...+1/n!]

偶數求1/2+1/4+...+1/n奇數1/1+1/3+...+1/n

計算1/1-1/2+1/3-1/4+1/5+……+1/99-1/100的值

C語言——兩種方法計算1/1-1/2+1/3-1/4+1/5 …… + 1/99 - 1/100 的值

pow函式（數學次方）在c語言的用法，兩種編寫方法例項（計算1/1-1/2+1/3-1/4+1/5 …… + 1/99 - 1/100 的值）

池與執行緒池技術點目錄 1. 執行緒池作用：提升效能 1 2. 使用流程 1 3. 執行緒與執行緒池的監控 jvisual 1 4. 執行緒常用方法 2 5. 執行緒池相關概念 2 5.1. 佇列

目錄 1.1. JVM記憶體模型總體架構圖 1 1.2. JAVA堆 2 1.3. 方法區元空間（Metaspace） 2 1.4. 虛擬機器棧 3 1.5. 本地方法區 4 2. 垃圾回收演算法 4 2

計算1/1-1/2+1/3-1/4+1/5······+1/99-1/100的值

1.計算1/1-1/2+1/3-1/4+1/5 …… + 1/99 - 1/100 的值2.實現陣列中值的交換

計算1/1-1/2+1/3-1/4+.....1/99-1/100的值

計算1/1-1/2+1/3-1/4+1/5 …… + 1/99

【C語言】計算1-1/2+1/3-1/4+1/5 …… + 1/99

陣列內容的交換——1/1-1/2+1/3-1/4+1/5 …… + 1/99 - 1/100 的求和——求 1到 100 的所有整數中出現多少次數字9

【C/C++程式碼練習11】1-1/2+1/3-1/4+...+1/n的兩種計算方法

求f(n)=1+1/2+1/3+1/4+...+1/n

計算1-1/2+1/3-1/4+...-1/100的幾種演算法總結

c語言：求多項式1-1/2+1/3-1/4+...+1/99-1/100的值，3種迴圈實現

6.計算1-1/2+1/3-1/4+......1/99-1/100=？

程式設計作業50頁3題分別使用 do-while和 for 迴圈計算1+1/2!+1/3!+1/4!...+1/20!

spark-1.4.1-bin-cdh5.3.2 Maven編譯

Spark 編譯前準備

編譯

Make 生成二進位制 tgz 包（解壓可直接執行）

Q & A

重要的參考資料

相關推薦