spark on yarn 開發（begin）

阿新 • • 發佈：2019-01-11

這個是根據董西成老師的部落格實驗，然後自己寫了一遍，中間遇到一些問題，索性記錄下來。

其實是個很簡單的 wordcount類，不過有了這些類，其他的程式碼，往裡面慢慢填就行了。

package org.apache.spark
import org.apache.spark._
import SparkContext._

object WordCount {
///apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar
def main(args: Array[String]) {
if (args.length != 2 ){
println("usage is org.test.WordCount <input> <output>")
return
}
val sparkConf = new SparkConf().setAppName("WordCount")
val sc = new SparkContext(sparkConf)
val textFile = sc.textFile(args(0))
val result = textFile.flatMap(line => line.split("\\s+")).map(word => (word, 1)).reduceByKey(_ + _)
result.saveAsTextFile(args(1))
}
}

shell 檔案為：

export YARN_CONF_DIR=/etc/hadoop/conf
SPARK_JAR=/apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar \
/apps/spark-1.2.0-bin-hadoop2.4/bin/spark-class org.apache.spark.deploy.yarn.Client \
--jar ./RELEASE/spark-test-wordcount.jar \
--class org.apache.spark.WordCount \
--args hdfs://UHVDATA012.uhome.haier.net:8020/yang/word.txt \
--args hdfs://UHVDATA012.uhome.haier.net:8020/yang/output \
--num-workers 1 \
--master-memory 2g \
--worker-memory 2g \
--worker-cores 2
~

話不多少，就這些。跑完日誌為：

[[email protected] yangjingbo]# ./wordcount.sh
Spark assembly has been built with Hive, including Datanucleus jars on classpath
WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with "--master yarn"
--args is deprecated. Use --arg instead.
--args is deprecated. Use --arg instead.
--args is deprecated. Use --arg instead.
15/01/06 13:27:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/06 13:27:33 INFO yarn.Client: Requesting a new application from cluster with 7 NodeManagers
15/01/06 13:27:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (13824 MB per container)
15/01/06 13:27:33 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/06 13:27:33 INFO yarn.Client: Setting up container launch context for our AM
15/01/06 13:27:33 INFO yarn.Client: Preparing resources for our AM container
15/01/06 13:27:33 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/01/06 13:27:33 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/01/06 13:27:33 INFO yarn.Client: Uploading resource file:/apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar ->
15/01/06 13:27:35 INFO yarn.Client: Uploading resource file:/home/yangjingbo/RELEASE/spark-test-wordcount.jar ->
15/01/06 13:27:35 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/06 13:27:35 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/01/06 13:27:35 INFO spark.SecurityManager: Changing view acls to: root
15/01/06 13:27:35 INFO spark.SecurityManager: Changing modify acls to: root
15/01/06 13:27:35 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/01/06 13:27:35 INFO yarn.Client: Submitting application 29 to ResourceManager
15/01/06 13:27:35 INFO impl.YarnClientImpl: Submitted application application_1416218486128_0029
15/01/06 13:27:36 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:36 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1420522055830
final status: UNDEFINED
user: root
15/01/06 13:27:37 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:38 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:39 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:40 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:41 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:42 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:42 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: ###########
ApplicationMaster RPC port: 0
queue: default
start time: 1420522055830
final status: UNDEFINED
user: root
15/01/06 13:27:43 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:44 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:45 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:46 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:47 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:48 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:49 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:50 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:51 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:52 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:53 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:54 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:55 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:56 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:57 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:58 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:59 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:28:00 INFO yarn.Client: Application report for application_1416218486128_0029 (state: FINISHED)
15/01/06 13:28:00 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: ##############
ApplicationMaster RPC port: 0
queue: default
start time: 1420522055830
final status: SUCCEEDED
user: root
./wordcount.sh: line 9: --num-workers: command not found
[

[email protected] yangjingbo]# hadoop dfs -ls /yang
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 2 items
drwxr-xr-x - root hdfs 0 2015-01-06 13:28 /yang/output
-rw-r--r-- 3 root hdfs 93 2015-01-06 11:38 /yang/word.txt
[[email protected] yangjingbo]# hadoop dfs -cat /yang/output
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

cat: `/yang/output': Is a directory
[[email protected] yangjingbo]# hadoop dfs -ls /yang/output/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 3 items
-rw-r--r-- 3 root hdfs 0 2015-01-06 13:28 /yang/output/_SUCCESS
-rw-r--r-- 3 root hdfs 0 2015-01-06 13:28 /yang/output/part-00000
-rw-r--r-- 3 root hdfs 49 2015-01-06 13:28 /yang/output/part-00001
[[email protected] yangjingbo]# hadoop dfs -cat /yang/output/part-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

[[email protected] yangjingbo]# hadoop dfs -cat /yang/output/part-00001
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

(name,2)
(hadoop,2)
(hdfs,3)
(redis,6)
(hbase,2)

spark on yarn 開發（begin）

spark on yarn 開發（begin）

spark遠端debug之除錯spark on yarn 程式（基於CDH平臺，1.6.0版本）

spark on yarn模式下內存資源管理（筆記2）

Spark on Yarn 詳解（轉）

大資料之Spark（八）--- Spark閉包處理，Spark的應用的部署模式，Spark叢集的模式，啟動Spark On Yarn模式，Spark的高可用配置

Spark on YARN簡介與執行wordcount（master、slave1和slave2）（博主推薦）

Spark on YARN模式的安裝（spark-1.6.1-bin-hadoop2.6.tgz + hadoop-2.6.0.tar.gz）（master、slave1和slave2）（博主推薦）

Spark學習（二）之叢集搭建(standalone、HA-standalone、 spark on yarn)

spark-shell on yarn 出錯（arn application already ended,might be killed or not able to launch applic）解決

自己的HADOOP平臺（三）：Mysql+hive遠端模式+Spark on Yarn

使用IDEA進行Spark開發（二）-第一個scala程式

Spark Streaming 專案實戰（12）—— Web層開發

VirtualBox5.0.18+CentOS7.2+Hadoop2.7.2配置與開發（2）用YARN完成WordCount作業

Django 應用開發（3）

Windows Phone開發（2）：豎立自信，初試鋒茫

Windows Phone開發（5）：室內裝修

Windows Phone開發（6）：處理屏幕方向的改變

Windows Phone開發（7）：當好總舵主

Windows Phone開發（3）：棋子未動，先觀全局

Windows Phone開發（10）：常用控件（上）

spark on yarn 開發（begin）

相關推薦