工作采坑劄記:3. Spark中es-hadoop插件異常解決
阿新 • • 發佈:2018-08-29
-h adp elastic sed thread ould dex flush 文檔
1. Es-Hadoop異常:
org.elasticsearch.hadoop.EsHadoopException: Could not write all entries [615/300864] (maybe ES was overloaded?). Bailing out... at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:235) at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:View Code186) at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:149) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:49) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
異常顯示elasticsearch的負載過高,處理方案具體如下(參考):
a. 增加Es重試次數及等待時間: es.batch.write.retry.count
和 es.batch.write.retry.wait
b. 減少hadoop或spark的任務數或作業數
c. 減少文檔/大小的數量(這是每個任務的方式 - 因此具有100個任務的作業將導致每個連接100x(docs))。
工作采坑劄記:3. Spark中es-hadoop插件異常解決