zookeeper 超时问题
问题1:
2020-03-01 16:04:06,085 [myid:1] - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@694] - Established session 0x10635fe2a6368f1 with negotiated timeout 120000 for client /10.62.3.14:55222 2020-03-01 16:06:12,006 [myid:1] - WARN [SyncThread:1:FileTxnLog@338] - fsync-ing the write ahead log in SyncThread:1 took 5073ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 2020-03-01 16:06:13,906 [myid:1] - WARN [SyncThread:1:FileTxnLog@338] - fsync-ing the write ahead log in SyncThread:1 took 1123ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
分析: ZK服务端在fsync-ing the write ahead log日志时超长引起。
解决办法:
1、在zoo.cfg添加:
forceSync=no
默认是开启的,为避免同步延迟问题,ZK接收到数据后会立刻去讲当前状态信息同步到磁盘日志文件中,同步完成后才会应答。将此项关闭后,客户端连接可以得到快速响应。Zk涮日志源码如下图:
关闭forceSync选项后,会存在潜在风险,虽然依旧会刷磁盘(log.flush()首先被执行),但因为操作系统为提高写磁盘效率,会先写缓存,当机器异常后,可能导致一些zk状态信息没有同步到磁盘,从而带来ZK前后信息不一样问题。
2、把zookeeper的日志文件和数据文件分开存储,不存在在一块磁盘
问题2:
2020-03-01 16:27:16,786 [myid:1] - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@694] - Established session 0x3075e0e93860151 with negotiated timeout 120000 for client /10.62.3.2:60124 2020-03-01 16:34:52,706 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x3075e0e93860135, likely client has closed socket 2020-03-01 16:34:52,706 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1056] - Closed socket connection for client /10.62.3.2:50244 which had sessionid 0x3075e0e93860135 2020-03-01 16:35:48,351 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x10635fe2a636912, likely client has closed socket 2020-03-01 16:35:48,351 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1056] - Closed socket connection for client /10.62.3.14:60822 which had sessionid 0x10635fe2a636912 2020-03-01 16:35:58,226 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x10635fe2a636914, likely client has closed socket 2020-03-01 16:35:58,226 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1056] - Closed socket connection for client /10.62.3.14:60856 which had sessionid 0x10635fe2a636914 2020-03-01 16:36:04,902 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x10635fe2a636910, likely client has closed socket
分析: 客户端连接Zookeeper时,配置的超时时长过短。
从上述的信息可以看出来,,会话超时时间已经设置了120s,对于hbase集群来说,,这个超时时间应该是没问题的,但是还是有的regionserver机器由于在flush memstor时失败了,,这里暂且在zoo.cfg文件,修改tickTime参数在观察看看。
记录学习和生活的酸甜苦辣.....哈哈哈