19C RAC更换OCR磁盘组后,ASM密码认证导致集群CRSD服务无法启动
前言
一套19.19的RAC,OCR所在的ASM磁盘组从+GRID更换为+DG_GRID,然后强制删除了原来的+GRID磁盘组,最终导致该集群无法启动。
过程
1、启动过程中,CSS服务正常启动,但CRS服务无法启动。此时,节点2的alertasm2.log日志中提示如下错误。
2023-06-23T17:44:33.667188+08:00 Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_13944.trc: ORA-17503: ksfdopn:2 Failed to open file +GRID/orapwasm ORA-15001: diskgroup "GRID" does not exist or is not mounted ORA-06512: at line 4 ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 679 ORA-06512: at line 2 2023-06-23T17:44:34.129085+08:00 Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_13944.trc: ORA-17503: ksfdopn:2 Failed to open file +GRID/orapwasm ORA-15001: diskgroup "GRID" does not exist or is not mounted ORA-06512: at line 4 ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 679 ORA-06512: at line 2 ORA-01017: invalid username/password; logon denied 2023-06-23T17:44:34.490668+08:00 Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_13944.trc: ORA-17503: ksfdopn:2 Failed to open file +GRID/orapwasm ORA-15001: diskgroup "GRID" does not exist or is not mounted ORA-06512: at line 4 ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 679 ORA-06512: at line 2 ^C [grid@19crac2 trace]$ |
从日志文件可以看出,集群启动的过程中,需要找ASM实例的密码文件,但以前的密码文件存放在+GRID磁盘组中,而这个ASM磁盘组已经被删除了。
2、为ASM实例创建新的密码文件,并修改OCR相关的信息。
[root@19crac2 ~]# srvctl config asm ASM home: <CRS home> Password file: +grid/orapwASM Backup of Password file: +grid/orapwASM_backup ASM listener: LISTENER ASM instance count: 3 Cluster ASM listener: ASMNET1LSNR_ASM
[root@19crac2 ~]# [root@19crac2 ~]# orapwd file='+DG_GRID/orapwASM' entries=5 password=welcome1 [root@19crac2 ~]# srvctl modify asm -pwfile +DG_GRID/orapwASM [root@19crac2 ~]# srvctl modify asm -pwfilebackup +DG_GRID/orapwASM_backup |
3、再次尝试重启集群,此时集群的CRSD服务仍然无法启动,crsd.trc日志中的错误信息如下所示。
2023-06-24 06:50:11.232*:kgfn.c@6088: kgfnGetBeqData: kgfnTgtInit failed, inst=NULL flags=0x6000 2023-06-24 06:50:11.235 : CLSNS:3425988352: clsns_SetTraceLevel:trace level set to 1. 2023-06-24 06:50:11.363 : OCRRAW:3425988352: kgfnConnect2: kgfnGetBeqData failed
2023-06-24 06:50:11.363*:kgfn.c@5268: kgfnConnect2: kgfnGetBeqData failed 2023-06-24 06:50:11.423 : OCRRAW:3425988352: kgfnConnect2Int: cstr=(DESCRIPTION=(TCP_USER_TIMEOUT=1)(CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp )(HOST=10.0.0.192)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))
2023-06-24 06:50:11.423*:kgfn.c@7122: kgfnConnect2Int: cstr=(DESCRIPTION=(TCP_USER_TIMEOUT=1)(CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=1 0.0.0.192)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=+ASM))) |
4、搜索MOS,找到Grid Infrastructure (GI) startup fails because crsd fails to start in a flex asm environment (Doc ID 2392762.1),文章中提到,这个故障可能的三种原因:(1)、sqlnet.ora中的SQLNET.AUTHENTICATION_SERVICES参数被设置成none。(2)、ASM密码不匹配。(3)、ASMlistener的网段不匹配。
在本次故障中,是第二种情况造成的故障原因。根据How to Recreate Shared ASM Password File in 19c Grid Infrastructure (GI) (Doc ID 2717306.1)文章中的方法进行修复。
5、在处理这个故障时,已经重建并且指定了新的密码文件,但为什么还提示ASM密码不匹配呢,主要是因为19C RAC开始,重建ASM密码文件的方法与以前不一样。从19.8开始,asmcmd多了一个新特性,允许用户使用asmcmd credverify 和 asmcmd credfix命令来创建ASM密码。
GI_HOME/bin/asmcmd --nocp credverify GI_HOME/bin/asmcmd --nocp credfix |
6、修复了ASM密码匹配问题后,GI集群重启成功。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· DeepSeek在M芯片Mac上本地化部署