zookeeper伪集群问题解决方案

zookeeper伪集群问题解决方案

首先配置文件:

tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper/data clientPort=2181 server.0=47.94.204.115:2888:3888 server.1=47.94.192.253:2888:3888 server.2=47.94.199.37:2888:3888

然后是:启动日志大面积异常:

2017-07-05 23:40:14,814 [myid:0] - WARN [WorkerSender[myid=0]:QuorumCnxManager@588] - Cannot open channel to 1 at election address /47.94.192.253:3888 java.net.ConnectException: 拒绝连接 (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433) at java.lang.Thread.run(Thread.java:745)

然后就是启动不起来,

再说解决办法:一波三折

首先我到47.94.192.253服务器上去查看netstat -nalp|java 发现端口如下

img

2181是zookeeper客户端连接的端口,所以进程号32143启动起来的,监听37271端口,但是zookeeper没有配置这个端口,而是配置2888,3888端口,正常情况下作为follower的时候是3888端口监听中,用于选举leader通讯。出现这个情况不得而知。重新启动该进程,上面一个端口号在不断的变化。至此问题是找到了,就是服务端进程没有监听配置的3888端口,而是监听了随机端口导致其它服务器进程无法与之通讯,所以看到了这个异常。

那么出现随机监听端口的原因要找到才能解决这个问题。我再次把日志文件重新打开发现在开头有这么一个异常:

2017-07-05 23:40:14,695 [myid:] - INFO [main:QuorumPeerConfig@134] - Reading configuration from: /usr/local/zookeeper/bin/../conf/zoo.cfg 2017-07-05 23:40:14,713 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.192.253 to address: /47.94.192.253 2017-07-05 23:40:14,713 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.204.115 to address: /47.94.204.115 2017-07-05 23:40:14,714 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.199.37 to address: /47.94.199.37 2017-07-05 23:40:14,714 [myid:] - INFO [main:QuorumPeerConfig@396] - Defaulting to majority quorums 2017-07-05 23:40:14,721 [myid:0] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 2017-07-05 23:40:14,725 [myid:0] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0 2017-07-05 23:40:14,725 [myid:0] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. 2017-07-05 23:40:14,741 [myid:0] - INFO [main:QuorumPeerMain@127] - Starting quorum peer 2017-07-05 23:40:14,751 [myid:0] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181 2017-07-05 23:40:14,776 [myid:0] - INFO [main:QuorumPeer@1134] - minSessionTimeout set to -1 2017-07-05 23:40:14,776 [myid:0] - INFO [main:QuorumPeer@1145] - maxSessionTimeout set to -1 2017-07-05 23:40:14,777 [myid:0] - INFO [main:QuorumPeer@1419] - QuorumPeer communication is not secured! 2017-07-05 23:40:14,778 [myid:0] - INFO [main:QuorumPeer@1448] - quorum.cnxn.threads.size set to 20 2017-07-05 23:40:14,793 [myid:0] - INFO [ListenerThread:QuorumCnxManager$Listener@739] - My election bind port: /47.94.204.115:3888 2017-07-05 23:40:14,794 [myid:0] - ERROR [/47.94.204.115:3888:QuorumCnxManager$Listener@763] - Exception while listening java.net.BindException: 无法指定被请求的地址 (Bind failed) at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) at java.net.ServerSocket.bind(ServerSocket.java:375) at java.net.ServerSocket.bind(ServerSocket.java:329) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(*QuorumCnxManager.java:742*) 2017-07-05 23:40:14,807 [myid:0] - INFO [QuorumPeer[myid=0]/0.0.0.0:2181:QuorumPeer@865] - LOOKING 2017-07-05 23:40:14,808 [myid:0] - INFO [QuorumPeer[myid=0]/0.0.0.0:2181:FastLeaderElection@818] - New election. My id = 0, proposed zxid=0x2 2017-07-05 23:40:14,810 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@600] - Notification: 1 (message format version), 0 (n.leader), 0x2 (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state) 2017-07-05 23:40:14,814 [myid:0] - WARN [WorkerSender[myid=0]:QuorumCnxManager@588] - Cannot open channel to 1 at election address /47.94.192.253:3888 java.net.ConnectException: 拒绝连接 (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433) at java.lang.Thread.run(Thread.java:745)

前面有一个绑定异常,一般来说出现这个异常的是很常见的2种原因:

1.端口被占用

2.ip地址不是本机网卡。

刚刚看了,3888端口没有被占用,那么出现的原因就是第二个了,

使用ifconfig命令查看得到如下结果:

 

img

果然是第一个原因,不存在这个网卡。可能有的朋友就要问了,问什么通过ssh这个ip地址能登录上来呢、原因很简单,这是云服务器,云服务器采用虚拟化的技术,监听的网卡是属于物理网关的网卡,而虚拟化机内部自然没有这个网卡。

这个时候真正的原因找到了,解决办法就是让服务器进程监听0.0.0.0的ip地址,也就是监听所有网卡。

怎么办呢,官网上翻了翻没找到这个配置说明。于是把zookeeper的源码拷贝过来。找到*QuorumCnxManager.java:742行*

img

 

发现前边有一个listenOnAllIPs这个参数,如果他是true,那么问题就解决了。于是向上级跟踪。找到QuorumPeerConfig.java中

img

很明显了,配置文件有一个quorumListenOnAllIPs参数指定为true

img

问题就解决了。

img

服务器监听端口3888了,为所有节点增加配置项,问题得到解决

img

posted @ 2020-12-20 22:51  墨染念颖  阅读(556)  评论(0编辑  收藏  举报