KingbaseES 集群启停系列 02 -- kingbase用户密码过期集群启动失败

案例说明:
在通用机环境,KingbaseES V8R6集群使用ssh建立节点互信,在kingbase用户系统密码过期后,节点之间的ssh互信失败,导致集群启动失败。

适用版本:
KingbaseES V8R6

问题解决思路:

  1. 执行'sh -x sys_monitor.sh start',分析脚本获取数据库服务启动的具体语句。
  2. 通过问题复现找到触发故障的根本原因。
  3. 提供故障问题对应的解决方案。

一、问题现象

如下所示,启动集群时,需要输入kingbase用户的系统密码,并提示“密码过期”,集群启动失败:

[kingbase@node202 bin]$ ./sys_monitor.sh start
2024-04-24 14:40:17 Ready to start all DB ...
2024-04-24 14:40:17 begin to start DB on "[192.168.1.202]".
kingbase@192.168.1.202's password:
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
kingbase@192.168.1.202's password:
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
2024-04-24 14:40:53 execute to start DB on "[192.168.1.202]" failed.
2024-04-24 14:40:53 Start DB on localhost(192.168.1.202) failed, will do nothing and exit.

二、问题分析

1、分析集群启动过程
如下所示,执行'sh -x sys_monitor.sh start'启动集群分析,在启动过程中kingbase系统用户会通过ssh连接节点启动数据库服务,当系统密码过期后,ssh互信失败,则无法正常启动数据库:

 /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_securecmd -o StrictHostKeyChecking=no -o ConnectTimeout=10 
-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 22 -o ServerAliveInterval=2 -o ServerAliveCountMax=3 
-l kingbase -T 192.168.1.201 
'cat /home/kingbase/cluster/R6C8/HAC8/kingbase/data/kingbase.pid 2>/dev/null|head -n 1'
kingbase@192.168.1.201's password:

二、问题复现
1、修改kingbase用户密码有效期

# kingbase用户有效期默认99999天
[root@node201 ~]# chage -l kingbase
Last password change                                    : Aug 25, 2023
Password expires                                        : never
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 0
Maximum number of days between password change          : 99999
Number of days of warning before password expires       : 7

# 将kingbase用户有效期改为1天
[root@node201 ~]# chage -M 1 kingbase
[root@node201 ~]# chage -l kingbase
Last password change                                    : Aug 25, 2023
Password expires                                        : Aug 26, 2023
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 0
Maximum number of days between password change          : 1
Number of days of warning before password expires       : 7

2、修改系统时间

[root@node201 ~]# date
Tue Oct 17 13:58:27 CST 2023

[root@node201 ~]# date 101513582023
Sun Oct 15 13:58:00 CST 2023
[root@node201 ~]# date
Sun Oct 15 13:58:01 CST 2023

3、测试ssh互信
如下所示,kingbase系统用户密码过期后,导致ssh连接失败:

[kingbase@node201 bin]$ ssh node201
You are required to change your password immediately (password aged)
Last login: Tue Oct 17 14:04:22 2023
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for user kingbase.
Changing password for kingbase.
(current) UNIX password:
New password:

4、测试集群启动
如下所示,在kingbase系统用户通过ssh连接节点启动数据库服务时,连接失败,数据库服务启动故障:

[kingbase@node202 bin]$ ./sys_monitor.sh start
2024-04-24 14:40:17 Ready to start all DB ...
2024-04-24 14:40:17 begin to start DB on "[192.168.1.202]".
kingbase@192.168.1.202's password:
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
kingbase@192.168.1.202's password:
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
2024-04-24 14:40:53 execute to start DB on "[192.168.1.202]" failed.
2024-04-24 14:40:53 Start DB on localhost(192.168.1.202) failed, will do nothing and exit.

数据库启动语句:

++ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_securecmd -o StrictHostKeyChecking=no -o ConnectTimeout=10 -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 22 -o ServerAliveInterval=2 -o ServerAliveCountMax=3 -l kingbase -T 192.168.1.202 'ping 192.168.1.1 -c 3 -w 3 | grep received | awk '\''{print $4}'\'''
kingbase@192.168.1.202's password:   
WARNING: Your password has expired.             ###提示kingbase用户密码过期
Password change required but no TTY available.
++ '[' 1 -ne 0 ']'
++ return 1
+ ping_result=
+ '[' 1 -ne 0 ']'
+ return 1
+ '[' 1 -eq 0 ']'
+ '[' 0 -eq 1 ']'
+ (( i++ ))
+ (( i<=3 ))
+ '[' 0 -eq 0 ']'
+ return 1
+ '[' 1 -ne 0 ']'
++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2024-04-26 14:55:33 Failed to ping trusted_servers on host 192.168.1.1, will do nothing and exit.'
2024-04-26 14:55:33 Failed to ping trusted_servers on host 192.168.1.1, will do nothing and exit.
+ exit 1

三、问题解决
修改kingbase用户的系统密码有效期或更新密码后,集群启动正常。

四、总结
对于通用机环境,如果对系统用户kingbase和root用户配置密码有效期,必须在密码到期前修改密码,保证ssh互信的正常。如果有系统用户密码保护需求的生产环境,可以考虑使用securecmdd工具替代ssh建立集群节点互信通讯。

posted @ 2024-03-29 18:36  KINGBASE研究院  阅读(123)  评论(0编辑  收藏  举报