zookeeper各种报错、原因及解决方法汇总(持续更新)
[root@iZ23nn1p4mjZ zookeeper-3.4.10]# bin/zkCli.sh -server localhost:2181,localhost:2182,localhost:2183 Connecting to localhost:2181,localhost:2182,localhost:2183 log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Welcome to ZooKeeper! JLine support is enabled [zk: localhost:2181,localhost:2182,localhost:2183(CONNECTING) 0] create -s /[sPacktPub] Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:706) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:599) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:371) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:331) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:290) [root@iZ23nn1p4mjZ zookeeper-3.4.10]# ls /
上述问题近发生在单机三节点模式时。
ZooKeeper自定义数据日志目录
ZooKeeper在启动时会将zookeeper.out输出到当前目录,不仅不友好,有时候可能会因为目录权限问题引发一些不必要的麻烦。
解决方案:设置ZOO_LOG_DIR环境变量,也可以改在shell中写,如下:
if [ $ZOO_LOG_DIR ] ; then export ZOO_LOG_DIR=/usr/local/zookeeper/logs fi if [ -z $ZOO_LOG_DIR ];then export ZOO_LOG_DIR=/usr/local/zookeeper/logs fi
conf/log4j.properties中定义:
zookeeper.log.dir=/usr/local/zookeeper/logs zookeeper.tracelog.dir=/usr/local/zookeeper/logs
有时候在网络变化、抖动或者服务器负载高的情况下会出现下列异常:
14:18:06.812 [main] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (127.0.0.1:2181) and timeout (5000) / elapsed (59220) org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
重启应用后仍然如此,重启zk后便自动恢复。
最近有个奇怪的现象{20200420更新},只要连接非本地,客户端IDEA中总是第一次连接的时候超时,第二次就成功了,每个客户端均如此。如下:
19:51:53.245 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=xx.xx.xx.xx:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@310d05c5 19:51:58.334 [main] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (xx.xx.xx.xx:2181) and timeout (5000) / elapsed (5104) org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225) [curator-client-2.13.0.jar:?] at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) [curator-client-2.13.0.jar:?] at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117) [curator-client-2.13.0.jar:?] at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:489) [curator-framework-2.13.0.jar:?] at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:740) [curator-framework-2.13.0.jar:?] at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:723) [curator-framework-2.13.0.jar:?] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) [curator-client-2.13.0.jar:?] at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:720) [curator-framework-2.13.0.jar:?] at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:484) [curator-framework-2.13.0.jar:?] at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474) [curator-framework-2.13.0.jar:?] at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:454) [curator-framework-2.13.0.jar:?] at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44) [curator-framework-2.13.0.jar:?] at com.alibaba.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient.createPersistent(CuratorZookeeperClient.java:96) [classes/:1.0.17]
连接超时时间太短,加大即可,有需求可有偿解决。
curator KeeperErrorCode = Unimplemented异常,如下:
org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCode = Unimplemented for /benchmark/test-key-1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:103) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1525) at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1189) at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1166) at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
ZooKeeper 3.5.x
- Curator 4.0 has a hard dependency on ZooKeeper 3.5.x
- If you are using ZooKeeper 3.5.x there's nothing additional to do - just use Curator 4.0
Curator 4.0 supports ZooKeeper 3.4.x ensembles in a soft-compatibility mode. To use this mode you must exclude ZooKeeper when adding Curator to your dependency management tool.
Curator 4.0默认不支持zk 3.4,必须兼容性处理,如下:
<dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.4.12</version> </dependency> <dependency> <groupId>org.apache.curator</groupId> <artifactId>curator-recipes</artifactId> <version>4.0.0</version> <!-- 4.0.0原生不兼容zk 3.4, 必须进行兼容性处理 --> <exclusions> <exclusion> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> </exclusion> </exclusions> </dependency>
见http://curator.apache.org/zk-compatibility.html
其他zk自身及zk与dubbo相关的异常可参见:https://www.cnblogs.com/zhjh256/p/11679639.html
2019-11-04 12:05:17,165 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30003, likely client has closed socket 2019-11-04 12:05:17,166 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22804 which had sessionid 0x1001e6b86a30003 2019-11-04 12:05:17,240 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30000, likely client has closed socket 2019-11-04 12:05:17,240 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22665 which had sessionid 0x1001e6b86a30000 2019-11-04 12:05:17,241 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30001, likely client has closed socket 2019-11-04 12:05:17,241 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22672 which had sessionid 0x1001e6b86a30001 2019-11-04 12:05:17,241 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30002, likely client has closed socket 2019-11-04 12:05:17,242 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22673 which had sessionid 0x1001e6b86a30002 2019-11-04 12:05:47,455 [myid:] - INFO [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30002, timeout of 40000ms exceeded 2019-11-04 12:05:47,456 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30002 2019-11-04 12:05:51,456 [myid:] - INFO [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30003, timeout of 40000ms exceeded 2019-11-04 12:05:51,456 [myid:] - INFO [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30000, timeout of 40000ms exceeded 2019-11-04 12:05:51,456 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30003 2019-11-04 12:05:51,457 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30000 2019-11-04 12:05:53,456 [myid:] - INFO [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30001, timeout of 40000ms exceeded 2019-11-04 12:05:53,457 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30001
该异常看起来和https://www.cnblogs.com/zhyg/p/11089736.html所述很像。
zookeeper服务器日志报连接数超出:
2020-04-16 17:16:11,135 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,241 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,371 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,381 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,463 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,527 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,553 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,580 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,609 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,686 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,796 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,818 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60 2020-04-16 17:16:11,853 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
该原因大概率是客户端有连接泄露,可通过netstat -ano | grep 2181分析,如下:
[root@hs-10-20-37-72 logs]# lsof -i:2181 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 2706 root 24u IPv6 18738 0t0 TCP *:eforward (LISTEN) java 2706 root 25u IPv6 3786862 0t0 TCP 10.20.25.225:eforward->10.20.25.225:63464 (ESTABLISHED) java 2706 root 26u IPv6 4748980 0t0 TCP 10.20.25.225:eforward->10.20.25.225:59063 (ESTABLISHED) java 2706 root 27u IPv6 6781917 0t0 TCP 10.20.25.225:eforward->10.20.25.225:51577 (ESTABLISHED) java 2706 root 28u IPv6 6781920 0t0 TCP 10.20.25.225:eforward->10.20.25.225:51578 (ESTABLISHED) java 2706 root 30u IPv6 6784242 0t0 TCP 10.20.25.225:eforward->10.20.25.225:51880 (ESTABLISHED) java 2706 root 31u IPv6 6784604 0t0 TCP 10.20.25.225:eforward->10.20.25.225:51989 (ESTABLISHED) java 2706 root 32u IPv6 6784610 0t0 TCP 10.20.25.225:eforward->10.20.25.225:51990 (ESTABLISHED) java 2706 root 33u IPv6 6788885 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52069 (ESTABLISHED) java 2706 root 34u IPv6 4412727 0t0 TCP 10.20.25.225:eforward->10.20.25.225:12441 (ESTABLISHED) java 2706 root 35u IPv6 6789241 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52127 (ESTABLISHED) java 2706 root 36u IPv6 6789263 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52130 (ESTABLISHED) java 2706 root 37u IPv6 6789329 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52140 (ESTABLISHED) java 2706 root 38u IPv6 6789422 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52149 (ESTABLISHED) java 2706 root 39u IPv6 6789572 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52166 (ESTABLISHED) java 2706 root 40u IPv6 6789693 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52183 (ESTABLISHED) java 2706 root 41u IPv6 6789807 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52200 (ESTABLISHED) java 2706 root 42u IPv6 6789981 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52220 (ESTABLISHED) java 2706 root 43u IPv6 6790066 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52239 (ESTABLISHED) java 2706 root 44u IPv6 6790173 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52250 (ESTABLISHED) java 2706 root 45u IPv6 6790238 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52259 (ESTABLISHED) java 2706 root 46u IPv6 6790287 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52272 (ESTABLISHED) java 2706 root 47u IPv6 6790356 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52285 (ESTABLISHED) java 2706 root 48u IPv6 6790447 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52297 (ESTABLISHED) java 2706 root 49u IPv6 6790485 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52307 (ESTABLISHED) java 2706 root 50u IPv6 6790597 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52337 (ESTABLISHED) java 2706 root 51u IPv6 6790638 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52343 (ESTABLISHED) java 2706 root 52u IPv6 6790677 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52356 (ESTABLISHED) java 2706 root 53u IPv6 6790726 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52368 (ESTABLISHED) java 2706 root 54u IPv6 6790842 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52387 (ESTABLISHED) java 2706 root 55u IPv6 6790871 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52395 (ESTABLISHED) java 2706 root 56u IPv6 6790883 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52402 (ESTABLISHED) java 2706 root 57u IPv6 6790992 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52415 (ESTABLISHED) java 2706 root 58u IPv6 6791084 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52433 (ESTABLISHED) java 2706 root 59u IPv6 6791118 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52449 (ESTABLISHED) java 2706 root 60u IPv6 6791187 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52460 (ESTABLISHED) java 2706 root 61u IPv6 6791270 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52479 (ESTABLISHED) java 2706 root 62u IPv6 6791334 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52488 (ESTABLISHED) java 2706 root 63u IPv6 6791427 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52501 (ESTABLISHED) java 2706 root 64u IPv6 6791470 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52518 (ESTABLISHED) java 2706 root 65u IPv6 6791542 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52531 (ESTABLISHED) java 2706 root 66u IPv6 6791631 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52545 (ESTABLISHED) java 2706 root 67u IPv6 6791669 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52555 (ESTABLISHED) java 2706 root 68u IPv6 6791719 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52566 (ESTABLISHED) java 2706 root 69u IPv6 6791797 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52581 (ESTABLISHED) java 2706 root 70u IPv6 6791847 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52592 (ESTABLISHED) java 2706 root 71u IPv6 6791883 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52602 (ESTABLISHED) java 2706 root 72u IPv6 6791931 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52615 (ESTABLISHED) java 2706 root 73u IPv6 6792017 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52627 (ESTABLISHED) java 2706 root 74u IPv6 6792056 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52641 (ESTABLISHED) java 2706 root 75u IPv6 6792090 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52648 (ESTABLISHED) java 2706 root 76u IPv6 6792199 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52661 (ESTABLISHED) java 2706 root 77u IPv6 6792283 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52681 (ESTABLISHED) java 2706 root 78u IPv6 6792299 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52688 (ESTABLISHED) java 2706 root 79u IPv6 6792345 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52698 (ESTABLISHED) java 2706 root 80u IPv6 6792413 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52709 (ESTABLISHED) java 2706 root 81u IPv6 6792495 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52727 (ESTABLISHED) java 2706 root 82u IPv6 6792563 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52736 (ESTABLISHED) java 2706 root 83u IPv6 6792625 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52745 (ESTABLISHED) java 2706 root 84u IPv6 6792674 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52758 (ESTABLISHED) java 2706 root 85u IPv6 6792755 0t0 TCP 10.20.25.225:eforward->10.20.25.225:52775 (ESTABLISHED) java 2706 root 86u IPv6 6290356 0t0 TCP 10.20.25.225:eforward->192.168.166.145:34608 (ESTABLISHED) java 2871 root 176u IPv6 4748979 0t0 TCP 10.20.25.225:59063->10.20.25.225:eforward (ESTABLISHED)
找到对应的进程和服务器。然后分析泄露,通常大概率是因为自己手写zk相关连接和监控、而不是使用客户端库如curator所致。
21:12:03.568 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.20.25.225/10.20.25.225:2181. Will not attempt to authenticate using SASL (unknown error) 21:12:03.568 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.20.25.225/10.20.25.225:2181, initiating session 21:12:03.576 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.20.25.225/10.20.25.225:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_191] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_191] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_191] at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_191] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_191] at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.8.jar!/:3.4.8--1] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar!/:3.4.8--1] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar!/:3.4.8--1] 21:12:03.729 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.20.25.225/10.20.25.225:2181. Will not attempt to authenticate using SASL (unknown error) 21:12:03.730 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.20.25.225/10.20.25.225:2181, initiating session 21:12:03.730 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.20.25.225/10.20.25.225:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_191] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_191] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_191] at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_191] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_191] at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.8.jar!/:3.4.8--1] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar!/:3.4.8--1] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar!/:3.4.8--1]
Caused by: org.apache.zookeeper.KeeperException$OperationTimeoutException: KeeperErrorCode = OperationTimeout
at com.dangdang.ddframe.job.reg.zookeeper.ZookeeperRegistryCenter.init(ZookeeperRegistryCenter.java:102) ~[elastic-job-common-core-2.1.5.jar:?]
检查网络是否正常,ping/telnet看下,基本上就这一个原因。
执行[root@192.168.1.1 bin]# echo conf | nc localhost 2181报下列输出
This ZooKeeper instance is not currently serving requests
说明zookeeper还活着,但不提供服务了。此时要查看zk的日志,dataLogDir指定的目录,一般里面可以看到为什么zk不提供服务了。如下:
2020-08-17 00:13:20,050 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state) 2020-08-17 00:13:20,252 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state) 2020-08-17 00:13:20,654 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state) 2020-08-17 00:13:21,463 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state) 2020-08-17 00:13:23,065 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state) 2020-08-17 00:13:26,271 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state) 2020-08-17 00:13:32,679 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state) 2020-08-17 00:13:40,040 [myid:0] - WARN [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@998] - Unexpected exception java.lang.InterruptedException: Timeout while waiting for epoch to be acked by quorum at org.apache.zookeeper.server.quorum.Leader.waitForEpochAck(Leader.java:936) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:415) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:995) 2020-08-17 00:13:40,040 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@520] - Shutting down 2020-08-17 00:13:40,040 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@526] - Shutdown called java.lang.Exception: shutdown Leader! reason: Forcing shutdown at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:526) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1001) 2020-08-17 00:13:40,041 [myid:0] - INFO [LearnerCnxAcceptor-/10.20.30.17:2182:Leader$LearnerCnxAcceptor@340] - exception while shutting down acceptor: java.net.SocketException: Socket closed 2020-08-17 00:13:40,041 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@910] - LOOKING 2020-08-17 00:13:40,042 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New election. My id = 0, proposed zxid=0x103c00000496 2020-08-17 00:13:40,042 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 0 (n.leader), 0x103c00000496 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 0 (n.sid), 0x103c (n.peerEpoch) LOOKING (my state) 2020-08-17 00:13:40,043 [myid:0] - WARN [WorkerSender[myid=0]:QuorumCnxManager@584] - Cannot open channel to 1 at election address/*原因和解决方法可见https://blog.csdn.net/everl_1/article/details/52253513*/ /10.20.30.13:2183 java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435) at java.lang.Thread.run(Thread.java:748) 2020-08-17 00:13:40,043 [myid:0] - INFO [WorkerSender[myid=0]:QuorumPeer$QuorumServer@185] - Resolved hostname: 10.20.30.13 to address: /10.20.30.13 2020-08-17 00:13:40,045 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 0 (n.leader), 0x103c00000496 (n.zxid), 0x17af2 (n.round), LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LOOKING (my state) 2020-08-17 00:13:40,245 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@992] - LEADING 2020-08-17 00:13:40,263 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@174] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datad ir /home/component/zookeeper/log/version-2 snapdir /home/component/zookeeper/data/version-2 2020-08-17 00:13:40,264 [myid:0] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@380] - LEADING - LEADER ELECTION TOOK - 222 2020-08-17 00:13:41,271 [myid:0] - INFO [LearnerHandler-/10.20.30.18:11030:LearnerHandler@346] - Follower sid: 2 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1efeac 2020-08-17 00:13:41,307 [myid:0] - ERROR [LearnerHandler-/10.20.30.18:11030:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:373) 2020-08-17 00:13:41,308 [myid:0] - WARN [LearnerHandler-/10.20.30.18:11030:LearnerHandler@661] - ******* GOODBYE /10.20.30.18:11030 ********
所以,此处根本原因是集群的其他节点连不上,导致节点异常退出。但是socket没有关闭,这样是为了在其他节点恢复后,集群整体自动恢复。
解决Zookeeper报错:conf is not executed because it is not in the whitelist的解决办法
[root@hs-10-20-37-72 ~]# echo wchs | nc 127.0.0.1 2181
2 connections watching 48 paths
Total watches:48
[root@hs-10-20-37-72 ~]# echo wchc | nc 127.0.0.1 2181
wchc is not executed because it is not in the whitelist.
[root@hs-10-20-37-72 ~]# echo wchp | nc 127.0.0.1 2181
wchp is not executed because it is not in the whitelist.
解决方法:cd /usr/local/zookeeper/conf下,用vim打开zoo.cfg,在最后添加:
4lw.commands.whitelist=*
重启zk。
ZooKeeper 支持某些特定的四字命令字母与其的交互。它们大多是查询命令,用来获取 ZooKeeper 服务的当前状态及相关信息。用户在客户端可以通过 telnet 或 nc 向 ZooKeeper 提交相应的命令。 ZooKeeper 常用四字命令见下表 1 所示:
表 1 : ZooKeeper 四字命令
ZooKeeper 四字命令 |
功能描述 |
conf |
输出相关服务配置的详细信息。 |
cons |
列出所有连接到服务器的客户端的完全的连接 / 会话的详细信息。包括“接受 / 发送”的包数量、会话 id 、操作延迟、最后的操作执行等等信息。 |
dump |
列出未经处理的会话和临时节点。 |
envi |
输出关于服务环境的详细信息(区别于 conf 命令)。 |
reqs |
列出未经处理的请求 |
ruok |
测试服务是否处于正确状态。如果确实如此,那么服务返回“ imok ”,否则不做任何相应。 |
stat |
输出关于性能和连接的客户端的列表。 |
wchs |
列出服务器 watch 的详细信息。 |
wchc |
通过 session 列出服务器 watch 的详细信息,它的输出是一个与 watch 相关的会话的列表。 |
wchp |
通过路径列出服务器 watch 的详细信息。它输出一个与 session 相关的路径。 |