kafka連zk報錯:Unable to read additional data from server sessionid 0x0...
問題描述:
主機資訊:
IP | hostname |
---|---|
10.0.0.10 | host10 |
10.0.0.12 | host12 |
10.0.0.13 | host13 |
在這三臺主機上部署一套zookeeper&kafka叢集環境的時候,zk叢集程序和埠都起來了。然後在啟動kafka的時候,報錯了,提示連不上zk。
因為該環境要求必須開啟防火牆,所以想到應該是因為2181埠沒有開放,所以kafka去連zk,提示連不上。於是在修改了防火牆,打開了2181埠和kafka的9092埠:
[[email protected] ~]# firewall-cmd --list-ports
2181/tcp 9092/tcp
[[email protected] ~]#
然後重新啟動kafka,依舊報錯,報錯資訊如下:
[[email protected] ~]# /usr/local/kafka_2.11-0.11.0.0/bin/kafka-server-start.sh /usr/local/kafka_2.11-0.11.0.0/config/server.properties [2018-06-15 11:33:25,244] INFO KafkaConfig values: advertised.host.name = null ...... 省略若干資訊 [2018-06-15 11:33:26,693] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:28,219] INFO Opening socket connection to server 10.0.0.12/10.0.0.12181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:28,221] INFO Socket connection established to 10.0.0.12/10.0.0.12181, initiating session (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:28,222] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:28,910] INFO Opening socket connection to server 10.0.0.13/10.0.0.13181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:28,911] INFO Socket connection established to 10.0.0.13/10.0.0.13181, initiating session (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:28,913] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:29,901] INFO Opening socket connection to server 10.0.0.10/10.0.0.10181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:29,902] INFO Socket connection established to 10.0.0.10/10.0.0.10181, initiating session (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:29,905] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:31,133] INFO Opening socket connection to server 10.0.0.12/10.0.0.12181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:31,135] INFO Socket connection established to 10.0.0.12/10.0.0.12181, initiating session (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:31,136] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:31,379] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread) [2018-06-15 11:33:31,553] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper) [2018-06-15 11:33:31,555] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server '10.0.0.10:2181,10.0.0.12:2181,10.0.0.13:2181' with timeout of 6000 ms at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1233) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:157) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:131) at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:103) at kafka.utils.ZkUtils$.apply(ZkUtils.scala:85) at kafka.server.KafkaServer.initZk(KafkaServer.scala:338) at kafka.server.KafkaServer.startup(KafkaServer.scala:191) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38) at kafka.Kafka$.main(Kafka.scala:65) at kafka.Kafka.main(Kafka.scala) [2018-06-15 11:33:31,555] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn) [2018-06-15 11:33:31,562] INFO shutting down (kafka.server.KafkaServer) [2018-06-15 11:33:31,568] INFO shut down completed (kafka.server.KafkaServer) [2018-06-15 11:33:31,568] FATAL Exiting Kafka. (kafka.server.KafkaServerStartable) [2018-06-15 11:33:31,571] INFO shutting down (kafka.server.KafkaServer)
檢視kafka的配置,並未發現任何異常。於是檢查zk自身是否OK。但是程序、埠都一切正常。
使用./zkCli.sh去檢查zk是否正常啟動,出現了報錯:
[[email protected] bin]# ./zkCli.sh
Connecting to localhost:2181
2018-06-15 14:44:05,215 [myid:] - INFO [main:[email protected]] - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
...... #省略若干資訊
2018-06-15 14:44:05,225 [myid:] - INFO [main: [email protected]] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 [email protected]
Welcome to ZooKeeper!
2018-06-15 14:44:05,255 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2018-06-15 14:44:05,334 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
2018-06-15 14:44:05,348 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
[zk: localhost:2181(CONNECTING) 0] 2018-06-15 14:44:05,564 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-06-15 14:44:05,565 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-06-15 14:44:05,573 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2018-06-15 14:44:07,626 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-06-15 14:44:07,627 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
2018-06-15 14:44:07,629 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2018-06-15 14:44:07,929 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-06-15 14:44:07,930 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Socket connection established to localhost/127.0.0.1:2181, initiating session
2018-06-15 14:44:07,932 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2018-06-15 14:44:09,329 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-06-15 14:44:09,330 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
2018-06-15 14:44:09,332 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2018-06-15 14:44:09,464 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
分析:
查看了網上很多的文章,有以下幾種說法:
1、zk的myid檔案和配置檔案server.xx的編號對不上:
2、zk只啟動了一個節點,其他節點沒有起來
排查:
1、檢查myid編號和配置檔案,是OK的,能對上:
[[email protected] ~]# cat /data/zookeeper/myid
1
[[email protected] ~]#
[[email protected] ~]# vim /usr/local/zookeeper-3.4.10/conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
...... #省略若干
server.1=10.0.0.10:2888:3888 #server.1和myid檔案的1是能對上的
server.2=10.0.0.12:2888:3888
server.3=10.0.0.13:2888:3888
2、zookeeper的配置檔案zoo.cfg裡面配置了三個server,實際上其他兩個節點沒有起來,因此根據zookeeper的選舉演算法,當整個叢集超過半數機器宕機,zookeeper會認為叢集處於不可用狀態。
檢查了三個節點的服務,都是起來了的。
試著將其中一臺zk節點10.0.0.10,修改zoo.cfg,註釋掉配置的其他兩個server,只留本機的server配置:
[[email protected] ~]# vim /usr/local/zookeeper-3.4.10/conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
...... #省略若干
server.1=10.0.0.10:2888:3888 #註釋掉server.2和server.3,只剩下本機節點
#server.2=10.0.0.12:2888:3888
#server.3=10.0.0.13:2888:3888
然後使用zkCli.sh去檢測zk狀態,就是OK的:
[[email protected] ~]# cd /usr/local/zookeeper-3.4.10/bin/
[[email protected] bin]# ./zkCli.sh
Connecting to localhost:2181
2018-06-15 15:58:10,586 [myid:] - INFO [main:[email protected]] - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
...... #省略若干
2018-06-15 15:58:10,597 [myid:] - INFO [main:[email protected]] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 [email protected]
Welcome to ZooKeeper!
2018-06-15 15:58:10,627 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2018-06-15 15:58:10,708 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
[zk: localhost:2181(CONNECTING) 0] 2018-06-15 15:58:10,733 [myid:] - INFO [main-SendThread(localhost:2181):[email protected]] - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x164023cac190003, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]
於是想到是防火牆只開了zookeeper的2181埠。zk有三個埠:
3個埠的作用
1、2181:對cline端提供服務
2、3888:選舉leader使用
3、2888:叢集內機器通訊使用(Leader監聽此埠)
因為防火牆的作用,三個節點之間2888和3888埠都是互相不通的,因此叢集之前沒有辦法通訊,可能就會以為是剩下兩個節點都掛掉了。於是就出現了上面的報錯
解決:
1、重新修改防火牆,開啟2888和3888埠
[[email protected] ~]# firewall-cmd --list-ports
3888/tcp 2181/tcp 2888/tcp 9092/tcp
[[email protected] ~]#
2、為保證萬一,把整個zk叢集全部按照順序重啟了一遍
3、三個幾點依次都用zkCli.sh去連了一下,狀態都是OK的
4、重新啟動kafka,就OK啦