zookeeper各种报错、原因及解决方法汇总(持续更新)

[root@iZ23nn1p4mjZ zookeeper-3.4.10]# bin/zkCli.sh -server localhost:2181,localhost:2182,localhost:2183
Connecting to localhost:2181,localhost:2182,localhost:2183
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: localhost:2181,localhost:2182,localhost:2183(CONNECTING) 0] create -s /[sPacktPub]
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3
    at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:706)
    at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:599)
    at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:371)
    at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:331)
    at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:290)
[root@iZ23nn1p4mjZ zookeeper-3.4.10]# ls /

  上述问题近发生在单机三节点模式时。

ZooKeeper自定义数据日志目录

  ZooKeeper在启动时会将zookeeper.out输出到当前目录,不仅不友好,有时候可能会因为目录权限问题引发一些不必要的麻烦。

  解决方案:设置ZOO_LOG_DIR环境变量,也可以改在shell中写,如下:

if [ $ZOO_LOG_DIR ] ; then
export ZOO_LOG_DIR=/usr/local/zookeeper/logs
fi

if [ -z $ZOO_LOG_DIR ];then
export ZOO_LOG_DIR=/usr/local/zookeeper/logs
fi

  conf/log4j.properties中定义:

zookeeper.log.dir=/usr/local/zookeeper/logs
zookeeper.tracelog.dir=/usr/local/zookeeper/logs

  有时候在网络变化、抖动或者服务器负载高的情况下会出现下列异常:

14:18:06.812 [main] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (127.0.0.1:2181) and timeout (5000) / elapsed (59220)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

  重启应用后仍然如此,重启zk后便自动恢复。

  最近有个奇怪的现象{20200420更新},只要连接非本地,客户端IDEA中总是第一次连接的时候超时,第二次就成功了,每个客户端均如此。如下:

19:51:53.245 [main] INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=xx.xx.xx.xx:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@310d05c5
19:51:58.334 [main] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (xx.xx.xx.xx:2181) and timeout (5000) / elapsed (5104)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225) [curator-client-2.13.0.jar:?]
	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) [curator-client-2.13.0.jar:?]
	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117) [curator-client-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:489) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:740) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:723) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) [curator-client-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:720) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:484) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:454) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44) [curator-framework-2.13.0.jar:?]
	at com.alibaba.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient.createPersistent(CuratorZookeeperClient.java:96) [classes/:1.0.17]

  连接超时时间太短,加大即可,有需求可有偿解决。

 

curator KeeperErrorCode = Unimplemented异常,如下:

org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCode = Unimplemented for /benchmark/test-key-1
 
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:103)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1525)
    at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1189)
    at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1166)
    at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
    at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)

 

ZooKeeper 3.5.x

  • Curator 4.0 has a hard dependency on ZooKeeper 3.5.x
  • If you are using ZooKeeper 3.5.x there's nothing additional to do - just use Curator 4.0

ZooKeeper 3.4.x

Curator 4.0 supports ZooKeeper 3.4.x ensembles in a soft-compatibility mode. To use this mode you must exclude ZooKeeper when adding Curator to your dependency management tool.

Curator 4.0默认不支持zk 3.4,必须兼容性处理,如下:

        <dependency>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
            <version>3.4.12</version>
        </dependency>
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-recipes</artifactId>
            <version>4.0.0</version>
            <!-- 4.0.0原生不兼容zk 3.4, 必须进行兼容性处理 -->
            <exclusions>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

  见http://curator.apache.org/zk-compatibility.html

  其他zk自身及zk与dubbo相关的异常可参见:https://www.cnblogs.com/zhjh256/p/11679639.html

2019-11-04 12:05:17,165 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30003, likely client has closed socket
2019-11-04 12:05:17,166 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22804 which had sessionid 0x1001e6b86a30003
2019-11-04 12:05:17,240 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30000, likely client has closed socket
2019-11-04 12:05:17,240 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22665 which had sessionid 0x1001e6b86a30000
2019-11-04 12:05:17,241 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30001, likely client has closed socket
2019-11-04 12:05:17,241 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22672 which had sessionid 0x1001e6b86a30001
2019-11-04 12:05:17,241 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30002, likely client has closed socket
2019-11-04 12:05:17,242 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22673 which had sessionid 0x1001e6b86a30002
2019-11-04 12:05:47,455 [myid:] - INFO  [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30002, timeout of 40000ms exceeded
2019-11-04 12:05:47,456 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30002
2019-11-04 12:05:51,456 [myid:] - INFO  [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30003, timeout of 40000ms exceeded
2019-11-04 12:05:51,456 [myid:] - INFO  [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30000, timeout of 40000ms exceeded
2019-11-04 12:05:51,456 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30003
2019-11-04 12:05:51,457 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30000
2019-11-04 12:05:53,456 [myid:] - INFO  [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30001, timeout of 40000ms exceeded
2019-11-04 12:05:53,457 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30001

  该异常看起来和https://www.cnblogs.com/zhyg/p/11089736.html所述很像。

  zookeeper服务器日志报连接数超出:

2020-04-16 17:16:11,135 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,241 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,371 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,381 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,463 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,527 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,553 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,580 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,609 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,686 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,796 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,818 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,853 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60

  该原因大概率是客户端有连接泄露,可通过netstat -ano | grep 2181分析,如下:

[root@hs-10-20-37-72 logs]# lsof -i:2181
COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
java     2706 root   24u  IPv6   18738      0t0  TCP *:eforward (LISTEN)
java     2706 root   25u  IPv6 3786862      0t0  TCP 10.20.25.225:eforward->10.20.25.225:63464 (ESTABLISHED)
java     2706 root   26u  IPv6 4748980      0t0  TCP 10.20.25.225:eforward->10.20.25.225:59063 (ESTABLISHED)
java     2706 root   27u  IPv6 6781917      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51577 (ESTABLISHED)
java     2706 root   28u  IPv6 6781920      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51578 (ESTABLISHED)
java     2706 root   30u  IPv6 6784242      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51880 (ESTABLISHED)
java     2706 root   31u  IPv6 6784604      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51989 (ESTABLISHED)
java     2706 root   32u  IPv6 6784610      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51990 (ESTABLISHED)
java     2706 root   33u  IPv6 6788885      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52069 (ESTABLISHED)
java     2706 root   34u  IPv6 4412727      0t0  TCP 10.20.25.225:eforward->10.20.25.225:12441 (ESTABLISHED)
java     2706 root   35u  IPv6 6789241      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52127 (ESTABLISHED)
java     2706 root   36u  IPv6 6789263      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52130 (ESTABLISHED)
java     2706 root   37u  IPv6 6789329      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52140 (ESTABLISHED)
java     2706 root   38u  IPv6 6789422      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52149 (ESTABLISHED)
java     2706 root   39u  IPv6 6789572      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52166 (ESTABLISHED)
java     2706 root   40u  IPv6 6789693      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52183 (ESTABLISHED)
java     2706 root   41u  IPv6 6789807      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52200 (ESTABLISHED)
java     2706 root   42u  IPv6 6789981      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52220 (ESTABLISHED)
java     2706 root   43u  IPv6 6790066      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52239 (ESTABLISHED)
java     2706 root   44u  IPv6 6790173      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52250 (ESTABLISHED)
java     2706 root   45u  IPv6 6790238      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52259 (ESTABLISHED)
java     2706 root   46u  IPv6 6790287      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52272 (ESTABLISHED)
java     2706 root   47u  IPv6 6790356      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52285 (ESTABLISHED)
java     2706 root   48u  IPv6 6790447      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52297 (ESTABLISHED)
java     2706 root   49u  IPv6 6790485      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52307 (ESTABLISHED)
java     2706 root   50u  IPv6 6790597      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52337 (ESTABLISHED)
java     2706 root   51u  IPv6 6790638      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52343 (ESTABLISHED)
java     2706 root   52u  IPv6 6790677      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52356 (ESTABLISHED)
java     2706 root   53u  IPv6 6790726      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52368 (ESTABLISHED)
java     2706 root   54u  IPv6 6790842      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52387 (ESTABLISHED)
java     2706 root   55u  IPv6 6790871      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52395 (ESTABLISHED)
java     2706 root   56u  IPv6 6790883      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52402 (ESTABLISHED)
java     2706 root   57u  IPv6 6790992      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52415 (ESTABLISHED)
java     2706 root   58u  IPv6 6791084      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52433 (ESTABLISHED)
java     2706 root   59u  IPv6 6791118      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52449 (ESTABLISHED)
java     2706 root   60u  IPv6 6791187      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52460 (ESTABLISHED)
java     2706 root   61u  IPv6 6791270      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52479 (ESTABLISHED)
java     2706 root   62u  IPv6 6791334      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52488 (ESTABLISHED)
java     2706 root   63u  IPv6 6791427      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52501 (ESTABLISHED)
java     2706 root   64u  IPv6 6791470      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52518 (ESTABLISHED)
java     2706 root   65u  IPv6 6791542      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52531 (ESTABLISHED)
java     2706 root   66u  IPv6 6791631      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52545 (ESTABLISHED)
java     2706 root   67u  IPv6 6791669      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52555 (ESTABLISHED)
java     2706 root   68u  IPv6 6791719      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52566 (ESTABLISHED)
java     2706 root   69u  IPv6 6791797      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52581 (ESTABLISHED)
java     2706 root   70u  IPv6 6791847      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52592 (ESTABLISHED)
java     2706 root   71u  IPv6 6791883      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52602 (ESTABLISHED)
java     2706 root   72u  IPv6 6791931      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52615 (ESTABLISHED)
java     2706 root   73u  IPv6 6792017      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52627 (ESTABLISHED)
java     2706 root   74u  IPv6 6792056      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52641 (ESTABLISHED)
java     2706 root   75u  IPv6 6792090      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52648 (ESTABLISHED)
java     2706 root   76u  IPv6 6792199      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52661 (ESTABLISHED)
java     2706 root   77u  IPv6 6792283      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52681 (ESTABLISHED)
java     2706 root   78u  IPv6 6792299      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52688 (ESTABLISHED)
java     2706 root   79u  IPv6 6792345      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52698 (ESTABLISHED)
java     2706 root   80u  IPv6 6792413      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52709 (ESTABLISHED)
java     2706 root   81u  IPv6 6792495      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52727 (ESTABLISHED)
java     2706 root   82u  IPv6 6792563      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52736 (ESTABLISHED)
java     2706 root   83u  IPv6 6792625      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52745 (ESTABLISHED)
java     2706 root   84u  IPv6 6792674      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52758 (ESTABLISHED)
java     2706 root   85u  IPv6 6792755      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52775 (ESTABLISHED)
java     2706 root   86u  IPv6 6290356      0t0  TCP 10.20.25.225:eforward->192.168.166.145:34608 (ESTABLISHED)
java     2871 root  176u  IPv6 4748979      0t0  TCP 10.20.25.225:59063->10.20.25.225:eforward (ESTABLISHED)

  找到对应的进程和服务器。然后分析泄露,通常大概率是因为自己手写zk相关连接和监控、而不是使用客户端库如curator所致。

21:12:03.568 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.20.25.225/10.20.25.225:2181. Will not attempt to authenticate using SASL (unknown error)
21:12:03.568 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to 10.20.25.225/10.20.25.225:2181, initiating session
21:12:03.576 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.20.25.225/10.20.25.225:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_191]
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_191]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_191]
	at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_191]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_191]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar!/:3.4.8--1]
21:12:03.729 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.20.25.225/10.20.25.225:2181. Will not attempt to authenticate using SASL (unknown error)
21:12:03.730 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to 10.20.25.225/10.20.25.225:2181, initiating session
21:12:03.730 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.20.25.225/10.20.25.225:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_191]
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_191]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_191]
	at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_191]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_191]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar!/:3.4.8--1]

Caused by: org.apache.zookeeper.KeeperException$OperationTimeoutException: KeeperErrorCode = OperationTimeout

at com.dangdang.ddframe.job.reg.zookeeper.ZookeeperRegistryCenter.init(ZookeeperRegistryCenter.java:102) ~[elastic-job-common-core-2.1.5.jar:?]

检查网络是否正常,ping/telnet看下,基本上就这一个原因。

执行[root@192.168.1.1 bin]# echo conf | nc localhost 2181报下列输出

   This ZooKeeper instance is not currently serving requests

说明zookeeper还活着,但不提供服务了。此时要查看zk的日志,dataLogDir指定的目录,一般里面可以看到为什么zk不提供服务了。如下:

2020-08-17 00:13:20,050 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:20,252 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:20,654 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:21,463 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:23,065 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:26,271 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:32,679 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:40,040 [myid:0] - WARN  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@998] - Unexpected exception
java.lang.InterruptedException: Timeout while waiting for epoch to be acked by quorum
    at org.apache.zookeeper.server.quorum.Leader.waitForEpochAck(Leader.java:936)
    at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:415)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:995)
2020-08-17 00:13:40,040 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@520] - Shutting down
2020-08-17 00:13:40,040 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@526] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Forcing shutdown
    at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:526)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1001)
2020-08-17 00:13:40,041 [myid:0] - INFO  [LearnerCnxAcceptor-/10.20.30.17:2182:Leader$LearnerCnxAcceptor@340] - exception while shutting down acceptor: java.net.SocketException: Socket closed
2020-08-17 00:13:40,041 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@910] - LOOKING
2020-08-17 00:13:40,042 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New election. My id =  0, proposed zxid=0x103c00000496
2020-08-17 00:13:40,042 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 0 (n.leader), 0x103c00000496 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 0 (n.sid), 0x103c (n.peerEpoch) LOOKING (my state)
2020-08-17 00:13:40,043 [myid:0] - WARN  [WorkerSender[myid=0]:QuorumCnxManager@584] - Cannot open channel to 1 at election address/*原因和解决方法可见https://blog.csdn.net/everl_1/article/details/52253513*/ /10.20.30.13:2183
java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
    at java.lang.Thread.run(Thread.java:748)
2020-08-17 00:13:40,043 [myid:0] - INFO  [WorkerSender[myid=0]:QuorumPeer$QuorumServer@185] - Resolved hostname: 10.20.30.13 to address: /10.20.30.13
2020-08-17 00:13:40,045 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 0 (n.leader), 0x103c00000496 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LOOKING (my state)
2020-08-17 00:13:40,245 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@992] - LEADING
2020-08-17 00:13:40,263 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@174] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datad
ir /home/component/zookeeper/log/version-2 snapdir /home/component/zookeeper/data/version-2
2020-08-17 00:13:40,264 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@380] - LEADING - LEADER ELECTION TOOK - 222
2020-08-17 00:13:41,271 [myid:0] - INFO  [LearnerHandler-/10.20.30.18:11030:LearnerHandler@346] - Follower sid: 2 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1efeac
2020-08-17 00:13:41,307 [myid:0] - ERROR [LearnerHandler-/10.20.30.18:11030:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:373)
2020-08-17 00:13:41,308 [myid:0] - WARN  [LearnerHandler-/10.20.30.18:11030:LearnerHandler@661] - ******* GOODBYE /10.20.30.18:11030 ********

所以,此处根本原因是集群的其他节点连不上,导致节点异常退出。但是socket没有关闭,这样是为了在其他节点恢复后,集群整体自动恢复。

解决Zookeeper报错:conf is not executed because it is not in the whitelist的解决办法

[root@hs-10-20-37-72 ~]# echo wchs | nc 127.0.0.1 2181
2 connections watching 48 paths
Total watches:48
[root@hs-10-20-37-72 ~]# echo wchc | nc 127.0.0.1 2181
wchc is not executed because it is not in the whitelist.
[root@hs-10-20-37-72 ~]# echo wchp | nc 127.0.0.1 2181
wchp is not executed because it is not in the whitelist.

解决方法:cd /usr/local/zookeeper/conf下,用vim打开zoo.cfg,在最后添加:

4lw.commands.whitelist=*

重启zk。

ZooKeeper 支持某些特定的四字命令字母与其的交互。它们大多是查询命令,用来获取 ZooKeeper 服务的当前状态及相关信息。用户在客户端可以通过 telnet 或 nc 向 ZooKeeper 提交相应的命令。 ZooKeeper 常用四字命令见下表 1 所示:

表 1 : ZooKeeper 四字命令

ZooKeeper 四字命令

功能描述

conf

输出相关服务配置的详细信息。

cons

列出所有连接到服务器的客户端的完全的连接 / 会话的详细信息。包括“接受 / 发送”的包数量、会话 id 、操作延迟、最后的操作执行等等信息。

dump

列出未经处理的会话和临时节点。

envi

输出关于服务环境的详细信息(区别于 conf 命令)。

reqs

列出未经处理的请求

ruok

测试服务是否处于正确状态。如果确实如此,那么服务返回“ imok ”,否则不做任何相应。

stat

输出关于性能和连接的客户端的列表。

wchs

列出服务器 watch 的详细信息。

wchc

通过 session 列出服务器 watch 的详细信息,它的输出是一个与 watch 相关的会话的列表。

wchp

通过路径列出服务器 watch 的详细信息。它输出一个与 session 相关的路径。

posted @ 2018-10-09 11:43  zhjh256  阅读(34023)  评论(0编辑  收藏  举报