KingbaseES集群典型案例之一---主库异常failover切机分析

案例说明:
某生产环境,KingbaseES V8R6集群主库系统异常触发主备failover切换,集群切换成功,客户需要了解触发failover切换的原因。本案例通过日志分析,详细描述了触发此次集群failover切换的原因。
适用版本:
KingbaseES V8R6

集群节点信息:

 x.x.x.43   原主库
 x.x.x.41   原备库

操作系统:
Kylin Sever 10

问题分析思路:

      1、分析备库hamgr.log日志,获取failover切换的时间点及过程。
      2、分析主库sys_log日志,获取failover切换前数据库状态,是否有异常。
      3、分析主库kbha.log日志,获取磁盘检测、网关检测、kbha、repmgrd进程状态信息。
      4、分析主库message日志 ,获取内核状态信息。
      5、通过以上日志,在相同时间点左右的故障信息,综合判断产生主库节点异常的具体原因。
      6、获取具体原因后,提出对应的解决方案。

一、获取和分析failover切换时间点(备库hamgr.log)

如下所示, 分析备库hamgr.log,在'2024-05-08 10:29:15'左右,备库repmgrd进程监控主库数据库服务异常,触发pqping的尝试连接,在尝试连接超过阈值6次失败后,触发failover切换,备库被promote为primary节点。

[2024-05-08 10:29:15] [INFO] checking state of node 3, 1 of 6 attempts
[2024-05-08 10:29:25] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=10.31.1.43 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
[2024-05-08 10:29:25] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
........
[2024-05-08 10:30:55] [INFO] checking state of node 3, 6 of 6 attempts
[2024-05-08 10:31:05] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=10.31.1.43 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
[2024-05-08 10:31:05] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2024-05-08 10:31:05] [WARNING] unable to reconnect to node 3 after 6 attempts
......
NOTICE: promoting standby to primary
DETAIL: promoting server "ybywdb01" (ID: 1) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
INFO: SET synchronous TO "async" on primary host 
NOTICE: STANDBY PROMOTE successful
DETAIL: server "ybywdb01" (ID: 1) was successfully promoted to primary

二、查看主库sys_log日志获取数据库异常原因
可以通过数据库sys_log日志,获取主库数据库状态异常信息,如慢查询、锁等待、OOM、代码异常core等因素,导致数据库负载增大或进程down,从而触发主备failover切换。
如下所示,从主库的sys_log日志分析,在触发主备failover切换前,系统无慢查询、锁等待、OOM、代码异常core等信息。其中“using stale
statistics instead of current ones because stats collector is not responding”日志,和已有的案例比对,此种异常的日志出现和磁盘I/O压力大有关。

2024-05-08 10:28:20 CST [2513598]: [1-1] user=,db=,client=LOG:  using stale statistics instead of current ones because stats collector is not responding
2024-05-08 10:28:24 CST [3073927]: [220-1] user=,db=,client=LOG:  using stale statistics instead of current ones because stats collector is not responding
2024-05-08 10:28:34 CST [3073927]: [221-1] user=,db=,client=LOG:  using stale statistics instead of current ones because stats collector is not responding
2024-05-08 10:30:39 CST [2501733]: [11-1] user=utic_rulesuser,db=utic_rules,client=10.31.1.14LOG:  duration: 149136.067 ms  parse <unnamed>: SELECT aa.APPLICATION_ID,......
......
2024-05-08 10:31:17 CST [25218]: [251-1] user=,db=,client=LOG:  received fast shutdown request
2024-05-08 10:31:19 CST [25218]: [252-1] user=,db=,client=LOG:  aborting any active transactions
2024-05-08 10:31:19 CST [2492906]: [1-1] user=utic_rulesuser,db=utic_rules,client=10.31.1.13FATAL:  terminating connection due to unexpected kingbase exit

三、查看主库kbha.log获取磁盘检测信息

集群通过kbha进程测试数据库存储目录data的读写状态,在kbha.log里记录了磁盘检测的信息。
如下所示,主库kbha.log下,在"2024-05-08 10:28:30"开始检测磁盘读写功能时,默认将执行6次,而日志显示,只完成了一次检测。

[2024-05-08 10:28:30] [INFO] [thread 3571119872] begin to check the mount_point_list 1 / 6
[2024-05-08 10:28:30] [INFO] [thread 3571119872] check directory "/data/kingbase/data"
[2024-05-08 10:28:30] [INFO] [thread 3571119872] create file "/data/kingbase/data/rw_status_file_20240508102830" in directory "/data/kingbase/data" ..
[2024-05-08 10:28:32] [NOTICE] PING 10.31.1.200 (10.31.1.200) 56(84) bytes of data.
--- 10.31.1.200 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1018ms
rtt min/avg/max/mdev = 0.712/0.731/0.750/0.019 ms
[2024-05-08 10:28:33] [INFO] [thread 3571119872] create file "/data/kingbase/data/rw_status_file_20240508102830" in directory "/data/kingbase/data" .. OK
[2024-05-08 10:28:33] [INFO] [thread 3571119872] write file "/data/kingbase/data/rw_status_file_20240508102830" in directory "/data/kingbase/data" ..
[2024-05-08 10:28:36] [NOTICE] PING 10.31.1.200 (10.31.1.200) 56(84) bytes of data.
[2024-05-08 10:29:07] [WARNING] unable to ping "user=esrep dbname=esrep port=54321 host=10.31.1.43 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
[2024-05-08 10:29:07] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2024-05-08 10:29:10] [NOTICE] PING 10.31.1.200 (10.31.1.200) 56(84) bytes of data.

四、查看主库系统message日志获取系统状态信息

系统message日志记录了操作系统内核的状态及错误信息,通过读取message日志获取在failover切换前系统的状态。
如下所示,在"May 8 10:31:14"开始,在message日志中记录了kingbase多个相关进程被阻塞,并且状态为“D”,时间长达120s以上,这个时间和触发主库异常,主备failover切换的时间匹配。

May  8 10:31:14 ybywdb03 kernel: [6432582.020700] INFO: task kingbase:2514485 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.020709]       Not tainted 4.19.90-23.8.v2101.ky10.x86_64 #1
May  8 10:31:14 ybywdb03 kernel: [6432582.020712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  8 10:31:14 ybywdb03 kernel: [6432582.020715] kingbase        D    0 2514485  25218 0x00000080
May  8 10:31:14 ybywdb03 kernel: [6432582.020719] Call Trace:
May  8 10:31:14 ybywdb03 kernel: [6432582.020737]  ? __schedule+0x296/0x910
May  8 10:31:14 ybywdb03 kernel: [6432582.020740]  schedule+0x28/0x80
May  8 10:31:14 ybywdb03 kernel: [6432582.020743]  schedule_timeout+0x1ee/0x3a0
May  8 10:31:14 ybywdb03 kernel: [6432582.020777]  ? xfs_buf_find.isra.28+0x1c4/0x5c0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.020780]  __down+0xa7/0x100
May  8 10:31:14 ybywdb03 kernel: [6432582.020815]  ? kmem_realloc+0x70/0xe0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.020845]  ? xfs_buf_find.isra.28+0x1e6/0x5c0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.020851]  ? down+0x3b/0x50
May  8 10:31:14 ybywdb03 kernel: [6432582.020853]  down+0x3b/0x50
May  8 10:31:14 ybywdb03 kernel: [6432582.020879]  xfs_buf_lock+0x33/0xf0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.020907]  xfs_buf_find.isra.28+0x1e6/0x5c0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.020935]  xfs_buf_get_map+0x40/0x2a0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.020960]  xfs_buf_read_map+0x28/0x170 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.020995]  xfs_trans_read_buf_map+0xac/0x2e0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.021023]  xfs_read_agi+0x95/0x140 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.021053]  xfs_iunlink+0x55/0x140 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.021083]  ? xfs_trans_ichgtime+0x20/0x50 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.021113]  xfs_remove+0x201/0x2d0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.021145]  xfs_vn_unlink+0x55/0xa0 [xfs]
May  8 10:31:14 ybywdb03 kernel: [6432582.021152]  ? may_delete+0x79/0x180
May  8 10:31:14 ybywdb03 kernel: [6432582.021153]  vfs_unlink+0x106/0x1a0
May  8 10:31:14 ybywdb03 kernel: [6432582.021157]  do_unlinkat+0x2ba/0x320
May  8 10:31:14 ybywdb03 kernel: [6432582.021164]  do_syscall_64+0x5b/0x1d0
May  8 10:31:14 ybywdb03 kernel: [6432582.021166]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May  8 10:31:14 ybywdb03 kernel: [6432582.021169] RIP: 0033:0x7f24410edff7
May  8 10:31:14 ybywdb03 kernel: [6432582.021179] Code: Bad RIP value.
May  8 10:31:14 ybywdb03 kernel: [6432582.021180] RSP: 002b:00007ffe394fc208 EFLAGS: 00000202 ORIG_RAX: 0000000000000057
May  8 10:31:14 ybywdb03 kernel: [6432582.021183] RAX: ffffffffffffffda RBX: 00007ffe394fc230 RCX: 00007f24410edff7
May  8 10:31:14 ybywdb03 kernel: [6432582.021184] RDX: 0000000000000001 RSI: 0000000000000014 RDI: 00007ffe394fc230
May  8 10:31:14 ybywdb03 kernel: [6432582.021185] RBP: 0000000000000014 R08: 0000000000000000 R09: 0000000000000001
May  8 10:31:14 ybywdb03 kernel: [6432582.021186] R10: 000000000000000f R11: 0000000000000202 R12: 0000000000000000
May  8 10:31:14 ybywdb03 kernel: [6432582.021186] R13: 0000000000000000 R14: 0000000000000006 R15: 0000000000000000
May  8 10:31:14 ybywdb03 kernel: [6432582.021190] INFO: task kingbase:2516527 blocked for more than 120 seconds.
.May  8 10:31:14 ybywdb03 kernel: [6432582.021197] kingbase        D    0 2516527  25218 0x00000080
.......
May  8 10:31:14 ybywdb03 kernel: [6432582.021343] INFO: task kingbase:2516528 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.021348] kingbase        D    0 2516528  25218 0x00000080
......
May  8 10:31:14 ybywdb03 kernel: [6432582.021485] INFO: task kingbase:2516529 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.021490] kingbase        D    0 2516529  25218 0x00000080
.......
May  8 10:31:14 ybywdb03 kernel: [6432582.021621] INFO: task kingbase:2516530 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.021626] kingbase        D    0 2516530  25218 0x00000080
.......
May  8 10:31:14 ybywdb03 kernel: [6432582.021758] INFO: task kingbase:2516531 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.021763] kingbase        D    0 2516531  25218 0x00000084
......
May  8 10:31:14 ybywdb03 kernel: [6432582.021889] INFO: task kingbase:2516532 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.021893] kingbase        D    0 2516532  25218 0x00000084
.......
May  8 10:31:14 ybywdb03 kernel: [6432582.022019] INFO: task kingbase:2516534 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.022025] kingbase        D    0 2516534  25218 0x00000084
......
May  8 10:31:14 ybywdb03 kernel: [6432582.022147] INFO: task kingbase:2516535 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.022152] kingbase        D    0 2516535  25218 0x00000084
.......
May  8 10:31:14 ybywdb03 kernel: [6432582.022277] INFO: task kingbase:2516536 blocked for more than 120 seconds.
May  8 10:31:14 ybywdb03 kernel: [6432582.022281] kingbase        D    0 2516536  25218 0x00000084

如下所示,被阻塞的kingbase进程的状态为"D"状态,一般D状态进程多为I/O压力过大,等待I/O响应引起。

五、查询Linux系统相关文档定位系统故障

从RedHat官方文档获取到"xfs_buf deadlock between inode deletion and block allocation"相关故障信息的文档,其错误日志和当前系统message出现的日志信息基本一致。
文档链接:https://access.redhat.com/solutions/6984334

如下所示,错误日志和当前主库message的日志基本一致:

RHEL8: xfs_buf deadlock between inode deletion and block allocation
 SOLUTION VERIFIED - Updated April 20 2023 at 6:15 AM - English 
Issue
The task is blocked with below logs:
Raw
[6553607.856923] INFO: task irisdb:1499256 blocked for more than 120 seconds.
[6553607.856958]       Tainted: G           OE    --------- -  - 4.18.0-372.9.1.el8.x86_64 #1
[6553607.856980] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[6553607.857001] task:irisdb   state:D stack:  0 pid:1499256 ppid: 1 flags:0x10004080
[6553607.857004] Call Trace:
[6553607.857007]  __schedule+0x2d1/0x830
[6553607.857016]  schedule+0x35/0xa0
[6553607.857018]  schedule_timeout+0x274/0x300
[6553607.857023]  ? dma_map_sg_attrs+0x30/0x50
[6553607.857027]  ? scsi_dma_map+0x35/0x40
[6553607.857033]  __down+0x9a/0xf0
[6553607.857037]  ? sd_init_command+0x1e9/0xc80 [sd_mod]
[6553607.857042]  ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[6553607.857104]  down+0x3b/0x50
[6553607.857108]  xfs_buf_lock+0x33/0xf0 [xfs]
[6553607.857147]  xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[6553607.857189]  xfs_buf_get_map+0x4c/0x320 [xfs]
[6553607.857230]  xfs_buf_read_map+0x53/0x310 [xfs]
[6553607.857267]  ? xfs_read_agf+0x8e/0x120 [xfs]
[6553607.857295]  xfs_trans_read_buf_map+0x1cb/0x360 [xfs]
[6553607.857343]  ? xfs_read_agf+0x8e/0x120 [xfs]
[6553607.857371]  xfs_read_agf+0x8e/0x120 [xfs]
[6553607.857399]  xfs_alloc_read_agf+0x3e/0x1e0 [xfs]
[6553607.857427]  xfs_alloc_fix_freelist+0x234/0x480 [xfs]
.......
[6553607.857973] RIP: 0033:0x7fab3945e60b
[6553607.857978] Code: Unable to access opcode bytes at RIP 0x7fab3945e5e1.
[6553607.857979] RSP: 002b:00007ffeb1429a58 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[6553607.857981] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fab3945e60b
[6553607.857982] RDX: 00007ffeb1429a95 RSI: 00007ffeb1428e8a RDI: 00007ffeb1429a60
[6553607.857983] RBP: 00007fab3b0c2690 R08: 00007ffeb1428e14 R09: 0000000000000000
[6553607.857985] R10: ff00ff00ff00ff00 R11: 0000000000000246 R12: 0000000000000034
[6553607.857986] R13: 0000000000000068 R14: ffffffffffffffff R15: 0000000000000003

具体的错误解释:
xfs_buf deadlock between inode deletion and block allocation
这个错误信息表明在XFS文件系统中发生了一个死锁的情况。具体来说,是因为在删除一个inode(索引节点)的过程中,和块分配过程同时发生了竞争,导致了操作无法继续进行。
XFS文件系统中,inode是用来存储文件的元数据的,而block是用来存储文件数据的。删除一个文件通常包括两个步骤:首先是删除inode,然后是释放相关的blocks。如果这两个过程之间缺乏恰当的同步机制,就可能会出现死锁。

解决方法:

  1. 更新系统固件和内核至最新版本,以修复可能导致死锁的已知问题。
  2. 如果是在高I/O负载下发生,尝试降低I/O负载,例如通过限制文件系统的吞吐量或者优先级。
  3. 如果是在高并发操作下发生,减少并发操作的数量,避免竞争。
  4. 如果可能,尝试在低负载时段进行删除操作,减少死锁的可能性。
  5. 如果是在特定的应用程序中发生,检查应用程序的文件操作逻辑,避免在同一时间进行不同类型的文件操作。
  6. 如果问题持续存在,可以考虑联系XFS社区或者专业的技术支持,寻求帮助。

六、分析总结

  1. 从备库hamgr.log日志获悉,在在'2024-05-08 10:29:15'左右,备库repmgrd进程监控主库数据库服务异常,触发pqping的尝试连接。
  2. 从主库sys_log日志获悉,在"2024-05-08 10:28:20" 左右,出现“using stale statistics instead of current ones because stats collector is not responding”日志,此日志一般由磁盘I/O阻塞引起。
  3. 从主库kbha.log日志获悉,在"2024-05-08 10:28:30"开始检测磁盘读写功能时,默认将执行6次,而日志显示,只完成了一次检测,磁盘读写检测异常。
  4. 在"May 8 10:31:14"开始,在message日志中记录了kingbase多个相关进程被阻塞,并且状态为“D”,时间长达120s以上,这个时间和触发主库异常,主备failover切换的时间匹配。
  5. 从RedHat官方获悉,系统的故障疑似和xfs的bug有关,触发了xfs的死锁,导致I/O阻塞。

通过以上分析,可知此次主库异常,是因为系统出现故障,导致kingbase进程处于”D“状体,进程被阻塞后,导致备库对主库的repmgrd的检测异常,pqping异常检测超过阈值后,触发failover切换,其中判断系统故障疑似和xfs的bug有关,需要系统管理员进一步确认。

posted @ 2024-05-31 15:32  天涯客1224  阅读(14)  评论(0编辑  收藏  举报