eclipse4.7.0+maven3.3.9+scala2.11.8+spark2.1.0+hadoop2.7.1在ubuntu16裡的wordcount例項
刪掉src/test下的junit內容
pom.xml參考如下進行修改(確認好使)
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>dblab</groupId> <artifactId>test1</artifactId> <version>0.0.1-SNAPSHOT</version> <name>${project.artifactId}</name> <description>My wonderfull scala app</description> <inceptionYear>2015</inceptionYear> <licenses> <license> <name>My License</name> <url>http://....</url> <distribution>repo</distribution> </license> </licenses> <properties> <maven.compiler.source>1.6</maven.compiler.source> <maven.compiler.target>1.6</maven.compiler.target> <encoding>UTF-8</encoding> <scala.version>2.11.8</scala.version> <spark.version>2.1.0</spark.version> <hadoop.version>2.6.0</hadoop.version> <scala.compat.version>2.11</scala.compat.version> </properties> <repositories> <repository> <id>scala-tools.org</id> <name>Scala-Tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </repository> </repositories> <pluginRepositories> <pluginRepository> <id>scala-tools.org</id> <name>Scala-Tools Maven2 Repository</name> <url>http://scala-tools.org/repo-releases</url> </pluginRepository> </pluginRepositories> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>org.specs</groupId> <artifactId>specs</artifactId> <version>1.2.5</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory> <testSourceDirectory>src/test/scala</testSourceDirectory> <plugins> <plugin> <!-- see http://davidb.github.com/scala-maven-plugin --> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.0</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> <configuration> <args> <arg>-make:transitive</arg> <arg>-dependencyfile</arg> <arg>${project.build.directory}/.scala_dependencies</arg> </args> </configuration> </execution> </executions> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <version>2.6</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build> </project>
app.scala參考如下程式碼
package dblab.test1 import org.apache.spark.SparkContext import org.apache.spark.SparkConf /** * @author ${user.name} */ object App { def foo(x : Array[String]) = x.foldLeft("")((a,b) => a + b) def main(args : Array[String]) { //println( "Hello World!" ) //println("concat arguments = " + foo(args)) val inputFile = "/test/test.txt" val conf = new SparkConf().setAppName("wordcount").setMaster("local[2]") val sc = new SparkContext(conf) val textFile = sc.textFile(inputFile) val wordcount = textFile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey((a,b)=>(a+b)) wordcount.foreach(println) } }
app.scala--》run as--》scala app
console輸出內容
........
(,1)
(aaa:11111111111,1)
(bbb:22222222222,1)
............
jar包形式編譯
maven-->maven build assembly:assembly
console日誌
[INFO] Building jar: /home/hadoop/workspace/test1/target/test1-0.0.1-SNAPSHOT-jar-with-dependencies.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:33 min
[INFO] Finished at: 2018-09-28T17:31:42+08:00
[INFO] Final Memory: 351M/827M
[INFO] ------------------------------------------------------------------------
放檔案
hadoop fs -put ./test.txt /test
執行
spark-submit --master spark://dblab-VirtualBox:7077 --class "dblab.test1.App" /home/hadoop/workspace/test1/target/test1-0.0.1-SNAPSHOT.jar
輸出結果
[email protected]:~/workspace/test1/target$ spark-submit --master spark://dblab-VirtualBox:7077 --class "dblab.test1.App" /home/hadoop/workspace/test1/target/test1-0.0.1-SNAPSHOT.jar
18/09/28 18:05:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/28 18:05:11 WARN Utils: Your hostname, dblab-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.4 instead (on interface enp0s3)
18/09/28 18:05:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
[Stage 0:> (0 + 2) / 2](,1)
(aaa:11111111111,1)
(bbb:22222222222,1)
[email protected]:~/workspace/test1/target$