關於在本地idea當中提交spark程式碼到遠端的錯誤總結(第二篇)
阿新 • • 發佈:2018-12-26
當代碼能正常提交到spark叢集執行的時候,出現下面的錯誤:
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at scala.collection.SeqViewLike$AbstractTransformed.<init>(SeqViewLike.scala:43) at scala.collection.SeqViewLike$$anon$4.<init>(SeqViewLike.scala:79) at scala.collection.SeqViewLike$class.newFlatMapped(SeqViewLike.scala:79) at scala.collection.SeqLike$$anon$2.newFlatMapped(SeqLike.scala:635) at scala.collection.SeqLike$$anon$2.newFlatMapped(SeqLike.scala:635) at scala.collection.TraversableViewLike$class.flatMap(TraversableViewLike.scala:160) at scala.collection.SeqLike$$anon$2.flatMap(SeqLike.scala:635) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:58) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:48) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:46) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:53) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:53) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:56) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:153) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:829) at p.JavaSparkPi.main(JavaSparkPi.java:30) Exception in thread "Thread-3" java.lang.OutOfMemoryError: PermGen space Exception in thread "Thread-30" java.lang.OutOfMemoryError: PermGen space Exception in thread "Thread-33" java.lang.OutOfMemoryError: PermGen space
除了出現上面的問題之外還會出現下面這個錯誤。看到這個錯誤的第一反應記憶體溢位
Job aborted due to stage failure: Total size of serialized results of 34 tasks (1033.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
Exception in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
2018-12-19 10:42:51,599 WARN [shuffle-client-0] server.TransportChannelHandler : Exception in connection from /10.8.30.108:50610
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
2018-12-19 10:42:51,610 INFO [dispatcher-event-loop-1] yarn.ApplicationMaster$AMEndpoint : Driver terminated or disconnected! Shutting down. tc-20024:50610
2018-12-19 10:42:51,614 INFO [dispatcher-event-loop-1] yarn.ApplicationMaster : Final app status: SUCCEEDED, exitCode: 0
2018-12-19 10:42:51,623 INFO [Thread-3] yarn.ApplicationMaster : Unregistering ApplicationMaster with SUCCEEDED
2018-12-19 10:42:51,637 INFO [Thread-3] impl.AMRMClientImpl : Waiting for application to be successfully unregistered.
2018-12-19 10:42:51,743 INFO [Thread-3] yarn.ApplicationMaster : Deleting staging directory .sparkStaging/application_1545188975663_0002
2018-12-19 10:42:51,745 INFO [Thread-3] util.ShutdownHookManager : Shutdown hook called
這個種種的跡象都顯示是程式的記憶體溢位造成的,那為什麼會記憶體溢位那,原因是我們隊結果集進行collect操作的時候,整的結果作為一個大的叢集全部的聚集到了driver 端也就是我們的idea當中。這個時候我們的客戶端如果記憶體不是夠大的情況下就會出現記憶體溢位的情況
你可以調大你的記憶體。但是這樣是治標不治本的操作,在後面的操作過程當中,你也不知道後面的資料量多大,配置多大的driver記憶體合適那,這個就很難界定了。所以我們在處理資料的時候儘量的減輕對driver端的壓力。可以使用foreachpartition的方法將資料全部在excutor端進行
處理。
參考這篇文章執行:https://segmentfault.com/a/1190000005365244?utm_source=tag-newest
這裡注意一下,所有的資料都是按照row輸出在excutor端的不是我們的控制檯。