1. 程式人生 > >工作采坑劄記:3. Spark中es-hadoop插件異常解決

工作采坑劄記:3. Spark中es-hadoop插件異常解決

-h adp elastic sed thread ould dex flush 文檔

1. Es-Hadoop異常:

技術分享圖片
org.elasticsearch.hadoop.EsHadoopException: Could not write all entries [615/300864] (maybe ES was overloaded?). Bailing out...
    at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:235)
    at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:
186) at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:149) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:49) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$
1.apply(EsSpark.scala:67) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
View Code

異常顯示elasticsearch的負載過高,處理方案具體如下(參考):

a. 增加Es重試次數及等待時間: es.batch.write.retry.count 和 es.batch.write.retry.wait

b. 減少hadoop或spark的任務數或作業數

c. 減少文檔/大小的數量(這是每個任務的方式 - 因此具有100個任務的作業將導致每個連接100x(docs))。

工作采坑劄記:3. Spark中es-hadoop插件異常解決