hadoop namenode 都是standby
跑spark-submit報錯,查看了一下ha的狀態,二臺namenode節點都是standby,其中一臺機器的JournalNode,掛掉了。
1,排查錯誤
jps檢視,缺少了JournalNode程序
檢視journalnode日誌,如下
2019-03-12 09:35:27,306 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8485 caught an exception
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2767)
at org.apache.hadoop.ipc.Server.access$2200(Server.java:139)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1121)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1193)
at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2134)
at org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1261)
at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:644)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2268)
2019-03-12 16:42:25,616 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: RECEIVED SIGNAL 15: SIGTERM
2019-03-12 16:42:25,619 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down JournalNode at testing/10.0.0.149
************************************************************/
2,解決辦法
2.1,啟動journalnode
# sbin/hadoop-daemon.sh start journalnode
2.2,自選舉active
# hdfs haadmin -failover -forceactive nn1 nn2
如果這個方法不行,就手動強制指定一個active
2.3,強制指定一個active
# hdfs haadmin -transitionToActive --forcemanual nn1
不要忘了加forcemanual
# hdfs haadmin -transitionToActive nn1 Automatic failover is enabled for NameNode at testing/10.0.0.149:9000 Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the --forcemanual flag.
如果這個方法還不行,就重啟整個hadoop叢集 。