1. 程式人生 > >詳細過程圖解——win10下idea使用sbt方式編譯scala原始碼

詳細過程圖解——win10下idea使用sbt方式編譯scala原始碼

win10下idea使用sbt方式編譯scala原始碼

雖然叢集使用Linux,Ubuntu的桌面版也有非常好的體驗感受,但是windows下寫碼編譯打包成jar檔案然後再提交到叢集執行也是不錯的,故本文記錄win10下如何搭建環境

準備工作

一些配置

開啟idea,右上角File->Settings 找到Plugins(外掛),選擇安裝Jetbrains的外掛 安裝idea外掛按鈕.png

在搜尋框搜尋Scala什麼版本無所謂,可能就是為了程式碼高亮並且有智慧提示和聯想。因為在之後sbt編譯的配置中不一定使用這個版本。 下載Scala外掛.png

外掛下載完成之後會提示restart idea,那就restart咯~

新建一個Project

,選擇sbt

新建專案.png

安裝環境需要java jdk-sbt-scala.png

開啟Settings,找到Build Tools,進行對sbt的一些自定義

可以使用在官網下載的sbt-1.1.6.zip,解壓,解壓後的檔案路徑請新增到環境變數

然後如下圖,Launcher選擇Custom,選擇剛剛解壓的檔案中sbt/bin/sbt-launch.jar

配置sbt.png

如果能夠用Google的小夥伴這一步無所謂,不能用就換源,可以換成阿里的~ 找到解壓sbt的目錄,進入conf資料夾,新建文字文件改名為repo.properties

[repositories]
  local
  aliyun: http://maven.aliyun.com/nexus/content/groups/public/
  typesafe: http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
  sonatype-oss-releases
  maven-central
  sonatype-oss-snapshots

完成之後,切換至Terminal

sbt有許多命令

sbt clean

sbt compile

sbt package

sbt assembly

···

我習慣使用package命令,將scala編譯後生成的內容打成jar包,提交到叢集執行 到此為止,可以鍵入sbt sbtVersion檢視sbt的版本,這個過程將陷入漫長的等待。 第一次會比較慢,可以去喝杯咖啡或者開一波黑,之後就會比較快了。

鍵入 sbt sbtVersion 的執行結果如果是這樣就表示成了!

D:\IDEAProjects\SparkSample>sbt sbtVersion
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading settings from idea.sbt ...
[info] Loading global plugins from C:\Users\zhongfuze\.sbt\1.0\plugins
[info] Loading settings from assembly.sbt ...
[info] Loading project definition from D:\IDEAProjects\SparkSample\project
[info] Loading settings from build.sbt ...
[info] Set current project to SparkSample (in build file:/D:/IDEAProjects/SparkSample/)
[info] 1.1.6

src/main/scala裡新建HelloScala.scala

object HelloScala {
  def main(args: Array[String]): Unit = {
    println("Hello Scala!")
  }
}

鍵入 sbt package 的執行結果如果是這樣就表示成了! 生成的jar包 在根目錄/target/scala-2.11/xxxxx.jar

D:\IDEAProjects\SparkSample>sbt package
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading settings from idea.sbt ...
[info] Loading global plugins from C:\Users\zhongfuze\.sbt\1.0\plugins
[info] Loading settings from assembly.sbt ...
[info] Loading project definition from D:\IDEAProjects\SparkSample\project
[info] Loading settings from build.sbt ...
[info] Set current project to SparkSample (in build file:/D:/IDEAProjects/SparkSample/)
[info] Compiling 1 Scala source to D:\IDEAProjects\SparkSample\target\scala-2.11\classes ...
[info] Done compiling.
[info] Packaging D:\IDEAProjects\SparkSample\target\scala-2.11\sparksample_2.11-1.0.jar ...
[info] Done packaging.
[success] Total time: 4 s, completed 2018-7-24 16:12:19

分割線

到這裡為止只是能夠能打包而已,但是coding的時候是需要各種各樣的配置的呢 而且為了便於維護各種各樣的依賴jar,properties,config,並且有時需要決定某些jar檔案需要被打包,有的jar檔案叢集裡已經存在了,不需要打包,要滿足這種需求,就使用sbt assembly命令

在專案根目錄/project下,與build.properties的同級目錄 新建assembly.sbt,內容如下

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.7")

Note:sbt-assembly的版本取決於sbt的版本,不同的版本區間版本號不一樣的!!!

assembly-sbt.png

接著找到根目錄下build.sbt 這裡可以自定義許多內容,新增依賴等等 更具體的用法請參考github-sbt-assembly https://github.com/sbt/sbt-assembly

name := "SparkSample"

version := "1.0"

organization := "com.zhong.PRM"

scalaVersion := "2.11.8"

assemblyJarName in assembly := "PRM.jar"

test in assembly := {}

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

assemblyMergeStrategy in assembly := {
  case PathList("javax", "servlet", [email protected]_*) => MergeStrategy.first
  case PathList([email protected]_*) if ps.last endsWith ".class" => MergeStrategy.first
  case PathList([email protected]_*) if ps.last endsWith ".xml" => MergeStrategy.first
  case PathList([email protected]_*) if ps.last endsWith ".properties" => MergeStrategy.first
  case "application.conf" => MergeStrategy.concat
  case "unwanted.txt" => MergeStrategy.discard
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0" % "provided"

####排除jar包 provided 表示打包的時候可以不包含這個jar包 sbt-assembly是根據專案配置的libraryDependencies依賴進行打包的,不需要打包的依賴可以設定”provided”進行排除

[build.sbt]
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0" % "provided"

####排除scala庫的jar包 在專案根目錄下建立assembly.sbt檔案並新增以下配置(注:sbt-assembly相關的配置,可以配置在專案根目錄/build.sbt中,也可以在專案根目錄下的assembly.sbt檔案中):

[assembly.sbt]
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

####明確排除某一指定jar包

[assembly.sbt]
assemblyExcludedJars in assembly := {
  val cp = (fullClasspath in assembly).value
  cp filter {_.data.getName == “compile-0.1.0.jar”}
}

####多個檔案共享相同的相對路徑

如果多個檔案共享相同的相對路徑(例如,多個依賴項JAR中名為application.conf的資源),則預設策略是驗證所有候選項具有相同的內容,否則出錯。可以使用以下內建策略之一或編寫自定義策略在每個路徑的基礎上配置此行為:

MergeStrategy.deduplicate是上面描述的預設值
MergeStrategy.first以類路徑順序選擇第一個匹配檔案
MergeStrategy.last選擇最後一個
MergeStrategy.singleOrError在衝突時出現錯誤訊息
MergeStrategy.concat簡單地連線所有匹配的檔案幷包含結果
MergeStrategy.filterDistinctLines也會連線,但在此過程中會遺漏重複項
MergeStrategy.rename重新命名源自jar檔案的檔案
MergeStrategy.discard只是丟棄匹配的檔案
路徑名稱到合併策略的對映是通過設定assemblyMergeStrategy完成的,可以按如下方式擴充

assemblyMergeStrategy in assembly := {
  case PathList("javax", "servlet", xs @ _*)         => MergeStrategy.first
  case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first
  case "application.conf"                            => MergeStrategy.concat
  case "unwanted.txt"                                => MergeStrategy.discard
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

放一些配置檔案做備忘

[plugins.sbt]
logLevel := Level.Warn

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
[build.sbt]

name := "lanke"

version := "1.0"

scalaVersion := "2.11.8"

assemblyJarName in assembly := "lanke.jar"

test in assembly := {}

assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

assemblyMergeStrategy in assembly := {
  case PathList("javax", "servlet", [email protected]_*) => MergeStrategy.first
  case PathList([email protected]_*) if ps.last endsWith ".class" => MergeStrategy.first
  case PathList([email protected]_*) if ps.last endsWith ".xml" => MergeStrategy.first
  case PathList([email protected]_*) if ps.last endsWith ".properties" => MergeStrategy.first
  case "application.conf" => MergeStrategy.concat
  case "unwanted.txt" => MergeStrategy.discard
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

resolvers ++= Seq(
  "kompics" at "http://kompics.sics.se/maven/repository/"
)

javacOptions ++= Seq("-encoding", "UTF-8", "-source", "1.7", "-target", "1.7")

resolvers ++= Seq(
  "libs-releases" at "http://artifactory.jd.com/libs-releases",
  "libs-snapshots" at "http://artifactory.jd.com/libs-snapshots",
  "plugins-releases" at "http://artifactory.jd.com/plugins-releases",
  "plugins-snapshots" at "http://artifactory.jd.com//plugins-snapshots"
)

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.11" % "2.3.0" % "provided",
  "org.apache.spark" % "spark-sql_2.11" % "2.3.0" % "provided",
  "org.apache.spark" % "spark-streaming_2.11" % "2.3.0" % "provided",
  "org.apache.spark" % "spark-hive_2.11" % "2.3.0" % "provided",
  "org.apache.spark" % "spark-repl_2.11" % "2.3.0" % "provided",
  "org.apache.spark" % "spark-tags_2.11" % "2.3.0" % "provided"
)

libraryDependencies += "com.yammer.metrics" % "metrics-core" % "2.2.0"

libraryDependencies += "com.typesafe" % "config" % "1.2.1"

libraryDependencies += "net.liftweb" % "lift-json_2.11" % "3.0"

libraryDependencies += "com.huaban" % "jieba-analysis" % "1.0.2"

resolvers += "Sonatype OSS Releases" at "http://oss.sonatype.org/content/repositories/releases/"

libraryDependencies += "com.thesamet" %% "kdtree" % "1.0.4"

libraryDependencies += "com.soundcloud" % "cosine-lsh-join-spark_2.10" % "1.0.1"

libraryDependencies += "org.tensorflow" %% "spark-tensorflow-connector" % "1.6.0"

libraryDependencies += "org.scalaj" %% "scalaj-http" % "2.4.0"

[tools/sbt/conf repo.properties]
[repositories]
  local
  my-ivy-proxy-releases: http://artifactory.jd.com/ivy-release/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
  my-maven-proxy-releases: http://artifactory.jd.com/libs-releases/
[respositories]
[repositories]
  local
  aliyun: http://maven.aliyun.com/nexus/content/groups/public/
  typesafe: http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
  sonatype-oss-releases
  maven-central
  sonatype-oss-snapshots