Spark：用Scala和Java實現WordCount

阿新 • • 發佈：2019-01-28

  1 Spark assembly has been built with Hive, including Datanucleus jars on classpath 
  2 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
  3 14/10/10 19:24:51 INFO SecurityManager: Changing view acls to: ebupt, 
  4 14/10/10 19:24:51 INFO SecurityManager: Changing modify acls to: ebupt, 
 
  5 14/10/10 19:24:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ebupt, ); users with modify permissions: Set(ebupt, ) 
  6 14/10/10 19:24:52 INFO Slf4jLogger: Slf4jLogger started 
  7 14/10/10 19:24:52 INFO Remoting: Starting remoting 
  8 
 14/10/10 19:24:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:56344] 
  9 14/10/10 19:24:52 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:56344] 
 10 14/10/10 19:24:52 INFO Utils: Successfully started service 'sparkDriver' on port 56344. 
 
 11 14/10/10 19:24:52 INFO SparkEnv: Registering MapOutputTracker 
 12 14/10/10 19:24:52 INFO SparkEnv: Registering BlockManagerMaster 
 13 14/10/10 19:24:52 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141010192452-3398 
 14 14/10/10 19:24:52 INFO Utils: Successfully started service 'Connection manager for block manager' on port 41110. 
 15 14/10/10 19:24:52 INFO ConnectionManager: Bound socket to port 41110 with id = ConnectionManagerId(eb174,41110) 
 16 14/10/10 19:24:52 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 
 17 14/10/10 19:24:52 INFO BlockManagerMaster: Trying to register BlockManager 
 18 14/10/10 19:24:52 INFO BlockManagerMasterActor: Registering block manager eb174:41110 with 265.4 MB RAM 
 19 14/10/10 19:24:52 INFO BlockManagerMaster: Registered BlockManager 
 20 14/10/10 19:24:52 INFO HttpFileServer: HTTP File server directory is /tmp/spark-8051667e-bfdb-4ecd-8111-52992b16bb13 
 21 14/10/10 19:24:52 INFO HttpServer: Starting HTTP Server 
 22 14/10/10 19:24:52 INFO Utils: Successfully started service 'HTTP file server' on port 48233. 
 23 14/10/10 19:24:53 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
 24 14/10/10 19:24:53 INFO SparkUI: Started SparkUI at http://eb174:4040 
 25 14/10/10 19:24:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
 26 14/10/10 19:24:53 INFO SparkContext: Added JAR file:/home/ebupt/test/WordCountByscala.jar at http://10.1.69.174:48233/jars/WordCountByscala.jar with timestamp 1412940293532 
 27 14/10/10 19:24:53 INFO AppClient$ClientActor: Connecting to master spark://eb174:7077... 
 28 14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
 29 14/10/10 19:24:53 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=278302556 
 30 14/10/10 19:24:53 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 265.3 MB) 
 31 14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141010192453-0009 
 32 14/10/10 19:24:53 INFO AppClient$ClientActor: Executor added: app-20141010192453-0009/0 on worker-20141008204132-eb176-49618 (eb176:49618) with 1 cores 
 33 14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141010192453-0009/0 on hostPort eb176:49618 with 1 cores, 1024.0 MB RAM 
 34 14/10/10 19:24:53 INFO AppClient$ClientActor: Executor added: app-20141010192453-0009/1 on worker-20141008204132-eb175-56337 (eb175:56337) with 1 cores 
 35 14/10/10 19:24:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141010192453-0009/1 on hostPort eb175:56337 with 1 cores, 1024.0 MB RAM 
 36 14/10/10 19:24:53 INFO AppClient$ClientActor: Executor updated: app-20141010192453-0009/0 is now RUNNING 
 37 14/10/10 19:24:53 INFO AppClient$ClientActor: Executor updated: app-20141010192453-0009/1 is now RUNNING 
 38 14/10/10 19:24:53 INFO MemoryStore: ensureFreeSpace(12633) called with curMem=163705, maxMem=278302556 
 39 14/10/10 19:24:53 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 265.2 MB) 
 40 14/10/10 19:24:53 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb174:41110 (size: 12.3 KB, free: 265.4 MB) 
 41 14/10/10 19:24:53 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 
 42 14/10/10 19:24:54 INFO FileInputFormat: Total input paths to process : 1 
 43 14/10/10 19:24:54 INFO SparkContext: Starting job: collect at WordCount.scala:26 
 44 14/10/10 19:24:54 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:26) 
 45 14/10/10 19:24:54 INFO DAGScheduler: Got job 0 (collect at WordCount.scala:26) with 2 output partitions (allowLocal=false) 
 46 14/10/10 19:24:54 INFO DAGScheduler: Final stage: Stage 0(collect at WordCount.scala:26) 
 47 14/10/10 19:24:54 INFO DAGScheduler: Parents of final stage: List(Stage 1) 
 48 14/10/10 19:24:54 INFO DAGScheduler: Missing parents: List(Stage 1) 
 49 14/10/10 19:24:54 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at WordCount.scala:26), which has no missing parents 
 50 14/10/10 19:24:54 INFO MemoryStore: ensureFreeSpace(3400) called with curMem=176338, maxMem=278302556 
 51 14/10/10 19:24:54 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 265.2 MB) 
 52 14/10/10 19:24:54 INFO MemoryStore: ensureFreeSpace(2082) called with curMem=179738, maxMem=278302556 
 53 14/10/10 19:24:54 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 265.2 MB) 
 54 14/10/10 19:24:54 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb174:41110 (size: 2.0 KB, free: 265.4 MB) 
 55 14/10/10 19:24:54 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 
 56 14/10/10 19:24:54 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:26) 
 57 14/10/10 19:24:54 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 
 58 14/10/10 19:24:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:35482/user/Executor#1456950111] with ID 0 
 59 14/10/10 19:24:56 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, eb176, ANY, 1238 bytes) 
 60 14/10/10 19:24:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:35502/user/Executor#-1231100997] with ID 1 
 61 14/10/10 19:24:56 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, eb175, ANY, 1238 bytes) 
 62 14/10/10 19:24:56 INFO BlockManagerMasterActor: Registering block manager eb176:33296 with 530.3 MB RAM 
 63 14/10/10 19:24:56 INFO BlockManagerMasterActor: Registering block manager eb175:32903 with 530.3 MB RAM 
 64 14/10/10 19:24:57 INFO ConnectionManager: Accepted connection from [eb176/10.1.69.176:39218] 
 65 14/10/10 19:24:57 INFO ConnectionManager: Accepted connection from [eb175/10.1.69.175:55227] 
 66 14/10/10 19:24:57 INFO SendingConnection: Initiating connection to [eb176/10.1.69.176:33296] 
 67 14/10/10 19:24:57 INFO SendingConnection: Initiating connection to [eb175/10.1.69.175:32903] 
 68 14/10/10 19:24:57 INFO SendingConnection: Connected to [eb175/10.1.69.175:32903], 1 messages pending 
 69 14/10/10 19:24:57 INFO SendingConnection: Connected to [eb176/10.1.69.176:33296], 1 messages pending 
 70 14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb175:32903 (size: 2.0 KB, free: 530.3 MB) 
 71 14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on eb176:33296 (size: 2.0 KB, free: 530.3 MB) 
 72 14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb176:33296 (size: 12.3 KB, free: 530.3 MB) 
 73 14/10/10 19:24:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on eb175:32903 (size: 12.3 KB, free: 530.3 MB) 
 74 14/10/10 19:24:58 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 1697 ms on eb175 (1/2) 
 75 14/10/10 19:24:58 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1715 ms on eb176 (2/2) 
 76 14/10/10 19:24:58 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
 77 14/10/10 19:24:58 INFO DAGScheduler: Stage 1 (map at WordCount.scala:26) finished in 3.593 s 
 78 14/10/10 19:24:58 INFO DAGScheduler: looking for newly runnable stages 
 79 14/10/10 19:24:58 INFO DAGScheduler: running: Set() 
 80 14/10/10 19:24:58 INFO DAGScheduler: waiting: Set(Stage 0) 
 81 14/10/10 19:24:58 INFO DAGScheduler: failed: Set() 
 82 14/10/10 19:24:58 INFO DAGScheduler: Missing parents for Stage 0: List() 
 83 14/10/10 19:24:58 INFO DAGScheduler: Submitting Stage 0 (ShuffledRDD[4] at reduceByKey at WordCount.scala:26), which is now runnable 
 84 14/10/10 19:24:58 INFO MemoryStore: ensureFreeSpace(2096) called with curMem=181820, maxMem=278302556 
 85 14/10/10 19:24:58 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.0 KB, free 265.2 MB) 
 86 14/10/10 19:24:58 INFO MemoryStore: ensureFreeSpace(1338) called with curMem=183916, maxMem=278302556 
 87 14/10/10 19:24:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1338.0 B, free 265.2 MB) 
 88 14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb174:41110 (size: 1338.0 B, free: 265.4 MB) 
 89 14/10/10 19:24:58 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 
 90 14/10/10 19:24:58 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (ShuffledRDD[4] at reduceByKey at WordCount.scala:26) 
 91 14/10/10 19:24:58 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
 92 14/10/10 19:24:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, eb175, PROCESS_LOCAL, 1008 bytes) 
 93 14/10/10 19:24:58 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, eb176, PROCESS_LOCAL, 1008 bytes) 
 94 14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb175:32903 (size: 1338.0 B, free: 530.3 MB) 
 95 14/10/10 19:24:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on eb176:33296 (size: 1338.0 B, free: 530.3 MB) 
 96 14/10/10 19:24:58 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:59119 
 97 14/10/10 19:24:58 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 144 bytes 
 98 14/10/10 19:24:58 INFO MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:39028 
 99 14/10/10 19:24:58 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 109 ms on eb176 (1/2) 
100 14/10/10 19:24:58 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 120 ms on eb175 (2/2) 
101 14/10/10 19:24:58 INFO DAGScheduler: Stage 0 (collect at WordCount.scala:26) finished in 0.123 s 
102 14/10/10 19:24:58 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
103 14/10/10 19:24:58 INFO SparkContext: Job finished: collect at WordCount.scala:26, took 3.815637915 s 
104 (scala,1) 
105 (Function2,1) 
106 (JavaSparkContext,1) 
107 (JavaRDD,1) 
108 (Tuple2,1) 
109 (,1) 
110 (org,7) 
111 (apache,7) 
112 (JavaPairRDD,1) 
113 (java,7) 
114 (function,4) 
115 (api,7) 
116 (Function,1) 
117 (PairFunction,1) 
118 (spark,7) 
119 (FlatMapFunction,1) 
120 (import,8) 
121 14/10/10 19:24:58 INFO SparkUI: Stopped Spark web UI at http://eb174:4040 
122 14/10/10 19:24:58 INFO DAGScheduler: Stopping DAGScheduler 
123 14/10/10 19:24:58 INFO SparkDeploySchedulerBackend: Shutting down all executors 
124 14/10/10 19:24:58 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 
125 14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb176,33296) 
126 14/10/10 19:24:58 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb176,33296) 
127 14/10/10 19:24:58 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(eb176,33296) not found 
128 14/10/10 19:24:58 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb175,32903) 
129 14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb175,32903) 
130 14/10/10 19:24:58 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb175,32903) 
131 14/10/10 19:24:58 INFO ConnectionManager: Key not valid ? [email protected] 
132 14/10/10 19:24:58 INFO ConnectionManager: key already cancelled ? [email protected] 
133 java.nio.channels.CancelledKeyException 
134 at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:310) 
135 at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) 
136 14/10/10 19:24:59 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 
137 14/10/10 19:24:59 INFO ConnectionManager: Selector thread was interrupted! 
138 14/10/10 19:24:59 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(eb176,33296) 
139 14/10/10 19:24:59 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(eb176,33296) not found 
140 14/10/10 19:24:59 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(eb176,33296) 
141 14/10/10 19:24:59 WARN ConnectionManager: All connections not cleaned up 
142 14/10/10 19:24:59 INFO ConnectionManager: ConnectionManager stopped 
143 14/10/10 19:24:59 INFO MemoryStore: MemoryStore cleared 
144 14/10/10 19:24:59 INFO BlockManager: BlockManager stopped 
145 14/10/10 19:24:59 INFO BlockManagerMaster: BlockManagerMaster stopped 
146 14/10/10 19:24:59 INFO SparkContext: Successfully stopped SparkContext 
147 14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
148 14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
149 14/10/10 19:24:59 INFO Remoting: Remoting shut down 
150 14/10/10 19:24:59 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.

Spark：用Scala和Java實現WordCount

1 Spark assembly has been built with Hive, including Datanucleus jars on classpath 2 Using Spark's default log4j profile: org/apache/spark/log4j-def

編譯原理：用Flex和 Bison實現一個功能更為強大的計算器

用Flex和 Bison實現一個功能更為強大的計算器，包含以下運算： a)加、減、乘、除運算 b)乘方、開方運算 c)位運算 – 與 & 、或 |、非 ~... d)階乘運算 !對數運算log 1.進一步完善計算器功能，實現對以下語法結構的

分別用scala和java執行Spring Boot專案，並輸出HelloWorld

這個必須Mark一下！用scala寫spring boot的資源好少，找的好辛苦以下都是我在網上找的前輩們寫的，有些地方只能理解一點，但是是可以成功執行的！我是在IDEA中用Maven構建的spring boot專案pom.xml<?xml version="1.0"

scala函數式編程初體驗 ==> 用scala函數實現 WordCount

spa val nbsp emp groupby mapred string hadoop map //定義一個字符串集合 scala> val lines = List("hadoop,hive,spark,hue,mapreduce"," ","hadoop,h

用maven來創建scala和java項目代碼環境（圖文詳解）（Intellij IDEA（Ultimate版本）、Intellij IDEA（Community版本）和Scala IDEA for Eclipse皆適用）（博主推薦）

搭建 ava XML .com 自動 ado ima 強烈 mapred 為什麽要寫這篇博客？　　首先，對於spark項目，強烈建議搭建，用Intellij IDEA（Ultimate版本），如果你還有另所愛好嘗試Scala IDEA for Eclipse，有時間自己去

軟件工程：java實現wordcount基本功能

param process mar 一個 match sig str 需求 war github鏈接：https://github.com/Nancy0611/wc 一：項目相關要求　　該項目能統計文本文件的字符數、單詞數和行數。這個項目要求寫一個命令行程序，模仿已有wc

spark高階資料分析系列之第二章用 Scala 和 Spark 進行資料分析

2.1資料科學家的Scala spark是用scala語言編寫的，使用scala語言進行大資料開發的好處有 1、效能開銷小減少不同環境下傳遞程式碼和資料的錯誤和效能開銷 2、能用上最新的版

Scala和Java二種方式實戰Spark Streaming開發

在這裡我主要借鑑課上老師講的以及官網的ＡＰＩ來進行簡單的Spark Streaming的開發：一：java形式： 1.我們可以總結一下步驟：第一步：建立SparkConf物件第二步：建立SparkStreamingContext 第三步：建立愛你

用Java實現WordCount

題目有一個檔案，裡面每一行都是一個IP地址，要對所有IP進行統計，並按降序排列。（先不考慮記憶體不夠的情況）思路這個題，在不考慮記憶體不夠的情況下，其實是很簡單的，主要涉及到的知識點有

spark學習記錄（一、scala與java編寫wordCount比較）

新增依賴： <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12<

用maven來建立scala和java專案程式碼環境（圖文詳解）（Intellij IDEA（Ultimate版本）、Intellij IDEA（Community版本）和Scala IDEA for Eclipse皆適用）（博主推薦）

　　不多說，直接上乾貨！為什麼要寫這篇部落格？　　首先，對於spark專案，強烈建議搭建，用Intellij IDEA（Ultimate版本），如果你還有另所愛好嘗試Scala IDEA for Eclipse，有時間自己去玩玩。但最好追隨大流。　　對於hadoop專案，強烈建議用

Scala IDEA for Eclipse裡用maven來建立scala和java專案程式碼環境（圖文詳解）

　　這篇部落格　　是在Scala IDEA for Eclipse裡手動建立scala程式碼編寫環境。　　本博文，教大家，用maven來建立。　　第一步：安裝scala外掛　　因為，我win7下的scala環境是2.10.4 　　所以，選擇下載的

跟著吳恩達學深度學習：用Scala實現神經網路-第二課：用Scala實現多層神經網路

上一章我們講了如何使用Scala實現LogisticRegression，這一張跟隨著吳恩達的腳步我們用Scala實現基礎的深度神經網路。順便再提一下，吳恩達對於深度神經網路的解釋是我如今聽過的最清楚的課，感嘆一句果然越是大牛知識解釋得越清晰明瞭。本文分為以下四個部分。

跟著吳恩達學深度學習：用Scala實現神經網路-第一課

1. Introduction 2017年8月，前百度首席科學家吳恩達先生在twitter上宣佈自己從百度離職後的第一個動作：在Coursera上推出一門從零開始構建神經網路的Deep Learning課程，一時間廣為轟動。

劍指Offer面試題7（Java版）：用兩個棧實現佇列與用兩個佇列實現棧

題目：用兩個棧實現一個佇列。佇列的宣告如下，請實現它的兩個函式appendTail和deletedHead,分別完成在佇列尾部插入節點和在佇列頭部刪除節點的功能。我們通過一個具體的例子來分析該佇列插入和刪除元素的過程。首先插入一個元素a，不妨先把它插入到stack1，此時

解析：用 CSS3 和 JavaScript 制作徑向動畫菜單

select webkit on() making 它的 text tran 表示 har 原作者的解析（英文）：http://creative-punch.net/2014/02/making-animated-radial-menu-css3-javascript

用Anko和Kotlin實現Android上的對話框和警告提示（KAD 24）

posit eve linear 免費 clas testing size uil 如何作者：Antonio Leiva 時間：Mar 9, 2017 原文鏈接：https://antonioleiva.com/dialogs-android-anko-kotlin/

用css3和canvas實現的蜂窩動畫效果

image() all nim 自己 clas 函數顯示 var 不兼容近期工作時研究了一下css3動畫和js動畫。主要是工作中為了增強頁面的趣味性，大家都有意無意的加入了非常多動畫效果。當然大部分都是css3動畫效果。能夠gpu加速，這會降低移動端的性能需求。今

用jQuery和Json實現Ajax異步請求

登錄 iter() pat pass dtd tran java encoding find 這裏有兩個例子，一個是關於登錄驗證的，一個是異步加載數據的 1、regist.jsp <%@ page language="java" import="java.util.

用bis和bic實現位級操作

www. -c fff 最簡規則 bool 異或 -s 生成轉載於 https://www.cnblogs.com/tlnshuju/p/7102021.html 20世紀70年代末至80年代末，DigitalEquipment的VAX計算機是一種非常流行的機型。

Spark：用Scala和Java實現WordCount

相關推薦