1. 程式人生 > >eclipse4.7.0+maven3.3.9+scala2.11.8+spark2.1.0+hadoop2.7.1在ubuntu16裡的wordcount例項

eclipse4.7.0+maven3.3.9+scala2.11.8+spark2.1.0+hadoop2.7.1在ubuntu16裡的wordcount例項

刪掉src/test下的junit內容

pom.xml參考如下進行修改(確認好使)

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>dblab</groupId>
	<artifactId>test1</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>${project.artifactId}</name>
	<description>My wonderfull scala app</description>
	<inceptionYear>2015</inceptionYear>
	<licenses>
		<license>
			<name>My License</name>
			<url>http://....</url>
			<distribution>repo</distribution>
		</license>
	</licenses>

	<properties>
		<maven.compiler.source>1.6</maven.compiler.source>
		<maven.compiler.target>1.6</maven.compiler.target>
		<encoding>UTF-8</encoding>

		<scala.version>2.11.8</scala.version>
		<spark.version>2.1.0</spark.version>
		<hadoop.version>2.6.0</hadoop.version>


		<scala.compat.version>2.11</scala.compat.version>
	</properties>

	<repositories>
		<repository>
			<id>scala-tools.org</id>
			<name>Scala-Tools Maven2 Repository</name>
			<url>http://scala-tools.org/repo-releases</url>
		</repository>
	</repositories>

	<pluginRepositories>
		<pluginRepository>
			<id>scala-tools.org</id>
			<name>Scala-Tools Maven2 Repository</name>
			<url>http://scala-tools.org/repo-releases</url>
		</pluginRepository>
	</pluginRepositories>

	<dependencies>

		<dependency>
			<groupId>org.scala-lang</groupId>
			<artifactId>scala-library</artifactId>
			<version>${scala.version}</version>
		</dependency>
		<dependency>
			<groupId>org.specs</groupId>
			<artifactId>specs</artifactId>
			<version>1.2.5</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.11</artifactId>
			<version>${spark.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>${hadoop.version}</version>
		</dependency>

	</dependencies>

	<build>
		<sourceDirectory>src/main/scala</sourceDirectory>
		<testSourceDirectory>src/test/scala</testSourceDirectory>
		<plugins>
			<plugin>
				<!-- see http://davidb.github.com/scala-maven-plugin -->
				<groupId>net.alchim31.maven</groupId>
				<artifactId>scala-maven-plugin</artifactId>
				<version>3.2.0</version>
				<executions>
					<execution>
						<goals>
							<goal>compile</goal>
							<goal>testCompile</goal>
						</goals>
						<configuration>
							<args>
								<arg>-make:transitive</arg>
								<arg>-dependencyfile</arg>
								<arg>${project.build.directory}/.scala_dependencies</arg>
							</args>
						</configuration>
					</execution>
				</executions>
			</plugin>
			
			<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.6</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
		</plugins>
	</build>
</project>

app.scala參考如下程式碼

package dblab.test1

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

/**
 * @author ${user.name}
 */
object App {
  
  def foo(x : Array[String]) = x.foldLeft("")((a,b) => a + b)
  
  def main(args : Array[String]) {
    //println( "Hello World!" )
    //println("concat arguments = " + foo(args))
      val inputFile = "/test/test.txt"
      val conf = new SparkConf().setAppName("wordcount").setMaster("local[2]")
      val sc = new SparkContext(conf)
      val textFile = sc.textFile(inputFile)
      val wordcount = textFile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey((a,b)=>(a+b))
      wordcount.foreach(println)     
  }

}

app.scala--》run as--》scala app

console輸出內容

........

(,1)
(aaa:11111111111,1)
(bbb:22222222222,1)

............

jar包形式編譯

maven-->maven build    assembly:assembly

console日誌

[INFO] Building jar: /home/hadoop/workspace/test1/target/test1-0.0.1-SNAPSHOT-jar-with-dependencies.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:33 min
[INFO] Finished at: 2018-09-28T17:31:42+08:00
[INFO] Final Memory: 351M/827M
[INFO] ------------------------------------------------------------------------

放檔案

hadoop fs -put ./test.txt /test

執行

spark-submit  --master spark://dblab-VirtualBox:7077 --class "dblab.test1.App"  /home/hadoop/workspace/test1/target/test1-0.0.1-SNAPSHOT.jar

輸出結果

[email protected]:~/workspace/test1/target$ spark-submit  --master spark://dblab-VirtualBox:7077 --class "dblab.test1.App"  /home/hadoop/workspace/test1/target/test1-0.0.1-SNAPSHOT.jar
18/09/28 18:05:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/28 18:05:11 WARN Utils: Your hostname, dblab-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.4 instead (on interface enp0s3)
18/09/28 18:05:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
[Stage 0:>                                                          (0 + 2) / 2](,1)
(aaa:11111111111,1)
(bbb:22222222222,1)
[email protected]:~/workspace/test1/target$