KingbaseES V8R6集群运维案例之---kingbase用户密码过期集群启动失败
案例说明:
在通用机环境,KingbaseES V8R6集群使用ssh建立节点互信,在kingbase用户系统密码过期后,节点之间的ssh互信失败,导致集群启动失败。
适用版本:
KingbaseES V8R6
问题解决思路:
- 执行'sh -x sys_monitor.sh start',分析脚本获取数据库服务启动的具体语句。
- 通过问题复现找到触发故障的根本原因。
- 提供解决问题对应的解决方案。
一、问题现象
如下所示,启动集群时,需要输入kingbase用户的系统密码,并提示“密码过期”,集群启动失败:
[kingbase@node202 bin]$ ./sys_monitor.sh start
2024-04-24 14:40:17 Ready to start all DB ...
2024-04-24 14:40:17 begin to start DB on "[192.168.1.202]".
kingbase@192.168.1.202's password:
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
kingbase@192.168.1.202's password:
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
2024-04-24 14:40:53 execute to start DB on "[192.168.1.202]" failed.
2024-04-24 14:40:53 Start DB on localhost(192.168.1.202) failed, will do nothing and exit.
二、问题分析
1、分析集群启动过程
如下所示,执行'sh -x sys_monitor.sh start'启动集群分析,在启动过程中kingbase系统用户会通过ssh连接节点启动数据库服务,当系统密码过期后,ssh互信失败,则无法正常启动数据库:
/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_securecmd -o StrictHostKeyChecking=no -o ConnectTimeout=10
-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 22 -o ServerAliveInterval=2 -o ServerAliveCountMax=3
-l kingbase -T 192.168.1.201
'cat /home/kingbase/cluster/R6C8/HAC8/kingbase/data/kingbase.pid 2>/dev/null|head -n 1'
kingbase@192.168.1.201's password:
二、问题复现
1、修改kingbase用户密码有效期
# kingbase用户有效期默认99999天
[root@node201 ~]# chage -l kingbase
Last password change : Aug 25, 2023
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 7
# 将kingbase用户有效期改为1天
[root@node201 ~]# chage -M 1 kingbase
[root@node201 ~]# chage -l kingbase
Last password change : Aug 25, 2023
Password expires : Aug 26, 2023
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 1
Number of days of warning before password expires : 7
2、修改系统时间
[root@node201 ~]# date
Tue Oct 17 13:58:27 CST 2023
[root@node201 ~]# date 101513582023
Sun Oct 15 13:58:00 CST 2023
[root@node201 ~]# date
Sun Oct 15 13:58:01 CST 2023
3、测试ssh互信
如下所示,kingbase系统用户密码过期后,导致ssh连接失败:
[kingbase@node201 bin]$ ssh node201
You are required to change your password immediately (password aged)
Last login: Tue Oct 17 14:04:22 2023
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for user kingbase.
Changing password for kingbase.
(current) UNIX password:
New password:
4、测试集群启动
如下所示,在kingbase系统用户通过ssh连接节点启动数据库服务时,连接失败,数据库服务启动故障:
[kingbase@node202 bin]$ ./sys_monitor.sh start
2024-04-24 14:40:17 Ready to start all DB ...
2024-04-24 14:40:17 begin to start DB on "[192.168.1.202]".
kingbase@192.168.1.202's password:
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
kingbase@192.168.1.202's password:
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
kingbase@192.168.1.202's password:
WARNING: Your password has expired.
Password change required but no TTY available.
2024-04-24 14:40:53 execute to start DB on "[192.168.1.202]" failed.
2024-04-24 14:40:53 Start DB on localhost(192.168.1.202) failed, will do nothing and exit.
数据库启动语句:
++ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_securecmd -o StrictHostKeyChecking=no -o ConnectTimeout=10 -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 22 -o ServerAliveInterval=2 -o ServerAliveCountMax=3 -l kingbase -T 192.168.1.202 'ping 192.168.1.1 -c 3 -w 3 | grep received | awk '\''{print $4}'\'''
kingbase@192.168.1.202's password:
WARNING: Your password has expired. ###提示kingbase用户密码过期
Password change required but no TTY available.
++ '[' 1 -ne 0 ']'
++ return 1
+ ping_result=
+ '[' 1 -ne 0 ']'
+ return 1
+ '[' 1 -eq 0 ']'
+ '[' 0 -eq 1 ']'
+ (( i++ ))
+ (( i<=3 ))
+ '[' 0 -eq 0 ']'
+ return 1
+ '[' 1 -ne 0 ']'
++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2024-04-26 14:55:33 Failed to ping trusted_servers on host 192.168.1.1, will do nothing and exit.'
2024-04-26 14:55:33 Failed to ping trusted_servers on host 192.168.1.1, will do nothing and exit.
+ exit 1
三、问题解决
修改kingbase用户的系统密码有效期后,集群启动正常。
四、总结
对于通用机环境,如果对系统用户kingbase和root用户配置密码有效期,必须在密码到期前修改密码,保证ssh互信的正常。如果有系统用户密码保护需求的生产环境,可以考虑使用securecmdd工具替代ssh建立集群节点互信通讯。