Atlas日志报Client session timed out, have not heard from server in 34640ms for sessionid 0xff821bd396610617
一、报错日志
2022-07-21 09:12:42,274 WARN - [main-SendThread(hadoop01:2181):] ~ Client session timed out, have not heard from server in 34640ms for sessionid 0xff821bd396610617 (ClientCnxn$SendThread:1190) 2022-07-21 09:12:42,819 WARN - [main-SendThread(hadoop01:2181):] ~ Unable to reconnect to ZooKeeper service, session 0xff821bd396610617 has expired (ClientCnxn$SendThread:1380) 2022-07-21 09:12:42,819 WARN - [main-EventThread:] ~ Session expired event received (ConnectionState:372) 2022-07-21 09:12:43,323 WARN - [NotificationHookConsumer thread-0:] ~ Exception in NotificationHookConsumer (NotificationHookConsumer$HookConsumer:550) org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:813) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:696) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1418) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1379) at org.apache.atlas.kafka.AtlasKafkaConsumer.commit(AtlasKafkaConsumer.java:106) at org.apache.atlas.notification.NotificationHookConsumer$HookConsumer.commit(NotificationHookConsumer.java:909) at org.apache.atlas.notification.NotificationHookConsumer$HookConsumer.handleMessage(NotificationHookConsumer.java:819) at org.apache.atlas.notification.NotificationHookConsumer$HookConsumer.doWork(NotificationHookConsumer.java:544) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
二、原因分析
1、对于如下报错:
Client session timed out, have not heard from server in 34640ms for sessionid 0xff821bd396610617 (ClientCnxn$SendThread:1190) 2022-07-21 09:12:42,819 WARN - [main-SendThread(hadoop01:2181):] ~ Unable to reconnect to ZooKeeper service, session 0xff821bd396610617 has expired (ClientCnxn$SendThread:1380)
我这边写了一个shell脚本,调用Atlas的CURL数据导出接口,从Atlas中导出数据,由于数据量比较大,导出时间就比较长,而Zookeeper这边配置的最小Session过期时间是1个半小时,一旦Session使用超过一个半小时,Session便会失效,从而导致导出数据的缺失
2、对于Kafka报错:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
日志也写得很清楚了:这通常意味着轮询循环花费了太多的时间来处理消息。说白了就是处理消息次数过多嘛。
三、解决方案
1、修改Zookeeper的minSessionTimeout、maxSessionTimeout两个参数,单位是毫秒,操作如下:
cd zookeeper安装目录
vim conf/zoo.cfg
修改内容如下:
#最小Session过期时间,设置为1天 minSessionTimeout=86400000 #最大Session过期时间,设置为2天 maxSessionTimeout=172800000
完成之后重启ZK即可
2、修改Kafka的batch.size、linger.ms这两个参数,将其调大,完成之后点击保存即可,无需重启Kafka集群
3、修改atlas-application.properties,调整如下参数:
atlas.kafka.zookeeper.session.timeout.ms=86200000 atlas.kafka.zookeeper.connection.timeout.ms=86200000 atlas.kafka.zookeeper.sync.time.ms=10000 atlas.kafka.auto.commit.interval.ms=180000 atlas.kafka.batch.size=10000 atlas.kafka.enable.auto.commit=true atlas.kafka.poll.timeout.ms=86200000 atlas.kafka.group.min.session.timeout.ms=86200000 atlas.kafka.group.max.session.timeout.ms=172000000 atlas.notification.consumer.retry.interval=1000 atlas.notification.hook.retry.interval=3000 atlas.kafka.max.poll.interval.ms=86200000 atlas.kafka.max.poll.records=10000
完成之后重启Atlas即可
四、参考地址:https://blog.csdn.net/u012206617/article/details/125186818