1. 程式人生 > >hive報錯java.io.IOException: Could not find status of job:job_1470047186803_131111

hive報錯java.io.IOException: Could not find status of job:job_1470047186803_131111

           環境:hadoop2.7.2 + hive1.2.1

        最近叢集環境下,有部分hive任務會報報如下錯誤:

java.io.IOException: Could not find status of job:job_1470047186803_131111
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Ended Job = job_1470047186803_131111 with exception 'java.io.IOException(Could not find status of job:job_1470047186803_131111)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
       具體現象為,一般在作業執行到100%之後,並且作業資訊在historyserver中無法找到。並且在某些情況下,作業重跑是可以成功的,非常詭異。

        後來通過查詢失敗任務的AMapplicationMaster的日誌:

2016-08-04 14:27:59,012 INFO [Thread-68] org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler failed in state STOPPED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException): The maximum path component name limit of job_1470047186803_131111-1470292057380-ide-create++table+temp.tem...%28%27%E5%B7%B2%E5%8F%96%E6%B6%88%27%2C%27%E6%8B%92%E6%94%B6%E5%85%A5%E5%BA%93%27%2C%27%E9%A9%B3%E5%9B%9E%27%29%28Stage-1470292073175-1-0-SUCCEEDED-root.data_platform-1470292063756.jhist_tmp in directory /tmp/hadoop-yarn/staging/history/done_intermediate/ide is exceeded: limit=255 length=258
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:911)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:976)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:838)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addFile(FSDirectory.java:426)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2575)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

        hive的執行原理:當任務執行結束後,對於每一個jobid,會根據一定規則,生成兩個檔案,一個是*.jhist,另一個是*.conf.這兩個檔案記錄了這個job的所有執行資訊,這兩個檔案是要寫入到jobhistory所監控的路徑的。

        作業的名字:  job_1470047186803_131111-1470292057380-ide-create++table+temp.tem...%28%27%E5%B7%B2%E5%8F%96%E6%B6%88%27%2C%27%E6%8B%92%E6%94%B6%E5%85%A5%E5%BA%93%27%2C%27%E9%A9%B3%E5%9B%9E%27%29%28Stage-1470292073175-1-0-SUCCEEDED-root.data_platform-1470292063756.jhist_tmp  

這個作業名中導致超長的原因主要是這部分:create++table+temp.tem...%28%27%E5%B7%B2%E5%8F%96%E6%B6%88%27%2C%27%E6%8B%92%E6%94%B6%E5%85%A5%E5%BA%93%27%2C%27%E9%A9%B3%E5%9B%9E%27%29%28Stage-1470292073175-1-。 這部分是根據hive的jobname來決定的,預設是從hql中的開頭和結尾擷取一部分,如果sql開頭或結尾有中文註釋的話,會被擷取進來,並進行url編碼。導致作業的資訊名變的非常長,超過了namenode所允許的最大的檔案命名長度。導致任務無法寫入historyserver。hive在historyserver無法獲得這個job的狀態,報開頭的錯誤。

      這裡提供一個簡單的解決辦法:

          set  hive.jobname.length=10;