1. 程式人生 > >spark on yarn 開發(begin)

spark on yarn 開發(begin)

這個是根據 董西成老師的 部落格實驗,然後自己寫了一遍,中間遇到一些問題,索性記錄下來。

其實是個很簡單的 wordcount類,不過有了這些類,其他的程式碼,往裡面慢慢填就行了。

package org.apache.spark
import org.apache.spark._
import SparkContext._


object WordCount {
  ///apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar
    def main(args: Array[String]) {
        if (args.length != 2 ){
          println("usage is org.test.WordCount <input> <output>")
          return
        }
        val sparkConf = new SparkConf().setAppName("WordCount")
        val sc = new SparkContext(sparkConf)
        val textFile = sc.textFile(args(0))
        val result = textFile.flatMap(line => line.split("\\s+")).map(word => (word, 1)).reduceByKey(_ + _)
        result.saveAsTextFile(args(1))
      }
}

shell 檔案為:

export YARN_CONF_DIR=/etc/hadoop/conf
SPARK_JAR=/apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar \
/apps/spark-1.2.0-bin-hadoop2.4/bin/spark-class org.apache.spark.deploy.yarn.Client \
--jar ./RELEASE/spark-test-wordcount.jar \
--class org.apache.spark.WordCount \
--args hdfs://UHVDATA012.uhome.haier.net:8020/yang/word.txt \
--args hdfs://UHVDATA012.uhome.haier.net:8020/yang/output \
--num-workers 1 \
--master-memory 2g \
--worker-memory 2g \
--worker-cores 2
~

話不多少,就這些。跑完日誌為:

[[email protected] yangjingbo]# ./wordcount.sh 
Spark assembly has been built with Hive, including Datanucleus jars on classpath
WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with "--master yarn"
--args is deprecated. Use --arg instead.
--args is deprecated. Use --arg instead.
--args is deprecated. Use --arg instead.
15/01/06 13:27:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/06 13:27:33 INFO yarn.Client: Requesting a new application from cluster with 7 NodeManagers
15/01/06 13:27:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (13824 MB per container)
15/01/06 13:27:33 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/06 13:27:33 INFO yarn.Client: Setting up container launch context for our AM
15/01/06 13:27:33 INFO yarn.Client: Preparing resources for our AM container
15/01/06 13:27:33 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/01/06 13:27:33 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/01/06 13:27:33 INFO yarn.Client: Uploading resource file:/apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar -> 
15/01/06 13:27:35 INFO yarn.Client: Uploading resource file:/home/yangjingbo/RELEASE/spark-test-wordcount.jar -> 
15/01/06 13:27:35 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/06 13:27:35 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/01/06 13:27:35 INFO spark.SecurityManager: Changing view acls to: root
15/01/06 13:27:35 INFO spark.SecurityManager: Changing modify acls to: root
15/01/06 13:27:35 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/01/06 13:27:35 INFO yarn.Client: Submitting application 29 to ResourceManager
15/01/06 13:27:35 INFO impl.YarnClientImpl: Submitted application application_1416218486128_0029
15/01/06 13:27:36 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:36 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1420522055830
         final status: UNDEFINED
         user: root
15/01/06 13:27:37 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:38 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:39 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:40 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:41 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:42 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:42 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: ###########
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1420522055830
         final status: UNDEFINED
         user: root
15/01/06 13:27:43 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:44 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:45 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:46 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:47 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:48 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:49 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:50 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:51 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:52 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:53 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:54 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:55 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:56 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:57 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:58 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:59 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:28:00 INFO yarn.Client: Application report for application_1416218486128_0029 (state: FINISHED)
15/01/06 13:28:00 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: ##############
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1420522055830
         final status: SUCCEEDED
         user: root
./wordcount.sh: line 9: --num-workers: command not found
[

[email protected] yangjingbo]# hadoop dfs -ls /yang
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


Found 2 items
drwxr-xr-x   - root hdfs          0 2015-01-06 13:28 /yang/output
-rw-r--r--   3 root hdfs         93 2015-01-06 11:38 /yang/word.txt
[[email protected] yangjingbo]# hadoop dfs -cat /yang/output
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


cat: `/yang/output': Is a directory
[[email protected] yangjingbo]# hadoop dfs -ls /yang/output/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


Found 3 items
-rw-r--r--   3 root hdfs          0 2015-01-06 13:28 /yang/output/_SUCCESS
-rw-r--r--   3 root hdfs          0 2015-01-06 13:28 /yang/output/part-00000
-rw-r--r--   3 root hdfs         49 2015-01-06 13:28 /yang/output/part-00001
[[email protected] yangjingbo]# hadoop dfs -cat /yang/output/part-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


[[email protected] yangjingbo]# hadoop dfs -cat /yang/output/part-00001
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


(name,2)
(hadoop,2)
(hdfs,3)
(redis,6)
(hbase,2)

相關推薦

spark on yarn 開發begin

這個是根據 董西成老師的 部落格實驗,然後自己寫了一遍,中間遇到一些問題,索性記錄下來。 其實是個很簡單的 wordcount類,不過有了這些類,其他的程式碼,往裡面慢慢填就行了。 package org.apache.spark import org.apache.spa

spark遠端debug之除錯spark on yarn 程式基於CDH平臺,1.6.0版本

簡介 由於spark有多種執行模式,遠端除錯的時候,雖然大體步驟相同,但是還是有小部分需要注意的地方,這裡記錄一下除錯執行在spark on yarn模式下的程式。 環境準備 需要完好的Hadoop,spark叢集,以便於提交spark on yarn程式。我這裡是基

spark on yarn模式下內存資源管理筆記2

warn 計算 nta 堆內存 註意 layout led -o exc 1.spark 2.2內存占用計算公式 https://blog.csdn.net/lingbo229/article/details/80914283 2.spark on yarn內存分配*

Spark on Yarn 詳解

1、spark on yarn有兩種模式,一種是cluster模式,一種是client模式。 a.執行命令“./spark-shell --master yarn”預設執行的是client模式。 b.執行"./spark-shell --master yarn-

大資料之Spark--- Spark閉包處理,Spark的應用的部署模式,Spark叢集的模式,啟動Spark On Yarn模式,Spark的高可用配置

一、Spark閉包處理 ------------------------------------------------------------ RDD,resilient distributed dataset,彈性(容錯)分散式資料集。 分割槽列表,fun

Spark on YARN簡介與執行wordcountmaster、slave1和slave2博主推薦

前期部落格 Spark On YARN模式   這是一種很有前景的部署模式。但限於YARN自身的發展,目前僅支援粗粒度模式(Coarse-grained Mode)。這是由於YARN上的Container資源是不可以動態伸縮的,一旦Container啟動之後,可使用

Spark on YARN模式的安裝spark-1.6.1-bin-hadoop2.6.tgz + hadoop-2.6.0.tar.gzmaster、slave1和slave2博主推薦

說白了   Spark on YARN模式的安裝,它是非常的簡單,只需要下載編譯好Spark安裝包,在一臺帶有Hadoop YARN客戶端的的機器上執行即可。    Spark on YARN分為兩種: YARN cluster(YARN standalone,0.9版本以前)和 YA

Spark學習 之叢集搭建(standalone、HA-standalone、 spark on yarn)

Spark standalone (一) 安裝過程 1、上傳並解壓縮 tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz -C apps/ 2、進入spark/conf修改配置檔案 cp slaves.template slav

spark-shell on yarn 出錯arn application already ended,might be killed or not able to launch applic解決

今天想要將spark-shell 在yarn-client的狀態下 結果出錯: [[email protected] spark-1.0.1-bin-hadoop2]$ bin/spark-shell --master yarn-client Spark ass

自己的HADOOP平臺:Mysql+hive遠端模式+Spark on Yarn

Spark和hive配置較為簡單,為了方便Spark對資料的使用與測試,因此在搭建Spark on Yarn模式的同時,也把Mysql + Hive一起搭建完成,並且配置Hive對Spark的支援,讓Spark也能像Hive一樣操作資料。 前期準備

使用IDEA進行Spark開發-第一個scala程式

上面一篇文章博主已經給大家演示好了如何去配置一個本機的scala開發環境,現在我們就一起去寫我們的第一個spark開發的scala程式吧! 開啟IDEA,選擇建立一個新的工程檔案。 點選scala,建立一個scala工程 輸入我們程式名稱——word

Spark Streaming 專案實戰12—— Web層開發

1 Web 層開發 1.1 POM 新增依賴 <dependency> <groupId>net.sf.json-lib</groupId> <artifactId>json-lib&

VirtualBox5.0.18+CentOS7.2+Hadoop2.7.2配置與開發2YARN完成WordCount作業

執行WordCount作業 步驟: 進入hadoop目錄,刪除input目錄和output目錄(如果有); 啟動hdfs系統,ResourceManager和NodeManager。 $cd   /usr/hadoop-2.7.2 $sbin/start-dfs.sh $s

Django 應用開發3

images alt doc include 新的 logs end esp eat 1.編寫第一個視圖 打開polls/view.py 利用一個URLconf將這個視圖映射到URL上。 首先先創建一個urls.py文件 編寫polls/urls.py 編寫m

Windows Phone開發2:豎立自信,初試鋒茫

一鍵 優秀 保持 知識 sdn ant emulator 一個 動畫 上一篇文章中,我們聊了一些“大炮”話題,從這篇文章開始,我們一起來學習WP開發吧。 一、我們有哪些裝備。 安裝完VS 學習版 for WP後,也連同SDK一並安裝了,不必像安卓那樣,安裝JDK,下載

Windows Phone開發5:室內裝修

表示 index can 進行 解釋 技術 面板 啟動 垂直 為什麽叫室內裝修呢?呵呵,其實說的是布局,具體些嘛,就是在一個頁面中,你如何去擺放你的控件,如何管理它們,你說,像不像我們剛搬進新住所,要“裝修”一番?買一套什麽樣的茶幾和杯具(我說的“杯具”指的是原意,不要理解

Windows Phone開發6:處理屏幕方向的改變

cati sources mon stack mar ber XML break pac 俺們都知道,智能手機可以通過旋轉手機來改變屏幕的顯示方向,更多的時候,對於屏幕方向的改變,我們要做出相應的處理,例如,當手機屏幕方向從縱向變為橫向時,可能要重新排列頁面上的控件以適應顯

Windows Phone開發7:當好總舵主

發的 content 數據 new 窗口 sdn 內容 str 剛才 吹完了頁面有關的話題,今天我們來聊一下頁面之間是如何導航的,在更多情況下,我們的應用程序不會只有一個頁面的,應該會有N個,就像我們做桌面應 用開發那樣,我們一個應用程序中可能不止一個窗體(極簡單的程序除外

Windows Phone開發3:棋子未動,先觀全局

csdn xaml hone activate 處理程序 為什麽 作業 單擊 不執行 在進行WP開發之前,與其它開發技術一樣,我們需要簡單了解一個WP應用序的生命周期,我們不一定要深入了解,但至少要知道在應用程序生命周期內的每一階段,我們應當做什麽,不推薦哪些操作等,這也是

Windows Phone開發10:常用控件

androi chm att size near grid txt idt inf Windows Phone的控件有幾個來源,和傳統的桌面應用程序開發或Web開發一樣,有默認提供的控件和第三方開者發布的控件。一般而言,如果不是過於復雜的界面布局,使用默認控件就足矣。相比之