[案例]Oracle11g RAC重启节点2-rac2,RAC不能正常提供服务
Oracle:11.2.0.4
Linux:RHEL6.8
2节点:rac1、rac2
主机名:ze02db01、ze02db02
故障复盘:
在节点2-ze02db02,停掉实例rac2
/u01/app/11.2.0/grid/bin/srvctl stop instance -d orcl
/u01/app/11.2.0/grid/bin/srvctl start instance -d orcl
此时,在节点1-ze02db01 ,查看数据库CRS状态不正常
ora.orcl.db
1 ONLINE ONLINE ze02db01 Open
2 ONLINE ONLINE ze02db02 starting ...
然后,我将在节点2-ze02db02
$ sqlplus / as sysdba
>startup
查看数据库CRS状态不正常,尝试在节点1 对节点2,进行重启
/u01/app/11.2.0/grid/bin/srvctl stop instance -d orcl
此时:RAC不能对外提供服务
[/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 12:12:29.432: [/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 12:12:29.636: [/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 12:12:29.839: [/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 12:12:30.043: [/u01/app/11.2.0/grid/bin/orarootagent.bin(9878)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsregistrymount" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/ze02db01/agent/crsd/orarootagent_root/orarootagent_root.log"2020-07-17 18:09:01.504: [crsd(9762)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.orcl.db'. Details at (:CRSPE00111:) {2:9343:228} in /u01/app/11.2.0/grid/log/ze02db01/crsd/crsd.log.
2020-07-17 18:50:45.700: [UiServer][2427864832]{1:2032:53986} Sending message to PE. ctx= 0x7f270000b850, Client PID: 12554 2020-07-17 18:50:45.700: [ CRSPE][2429966080]{1:2032:53986} Cmd : 0x7f270c113cf0 : flags: FORCE_TAG | HOST_TAG | QUEUE_TAG 2020-07-17 18:50:45.700: [ CRSPE][2429966080]{1:2032:53986} Processing PE command id=119569. Description: [Start Resource : 0x7f270c113cf0] 2020-07-17 18:50:45.702: [ CRSPE][2429966080]{1:2032:53986} Filtering duplicate ops: server [ze02db02] state [ONLINE] 2020-07-17 18:50:45.702: [ CRSPE][2429966080]{1:2032:53986} Op 0x7f270c00db10 has 16 WOs 2020-07-17 18:50:45.702: [ CRSPE][2429966080]{1:2032:53986} ICE has queued an operation. Details: Operation [START of [ora.orcl.db 2 1] on [ze02db02] : local=0, unplanned=00x7f270c00db10] c annot run cause it needs W lock for: WO for Placement Path RI:[ora.orcl.db 2 1] server [ze02db02] target states [ONLINE INTERMEDIATE ], locked by op [START of [ora.orcl.db 2 1] on [ze02db02] : local=0, unplanned=00x7f270c0df540]. Owner: CRS-2682: It is locked by 'grid' for command 'Start Resource' issued from 'ze02db02' 2020-07-17 18:50:49.490: [ CRSPE][2429966080]{2:9343:273} Processing PE command id=323. Description: [Stat Resource : 0x7f270c00d8a0] 2020-07-17 18:50:51.506: [ CRSPE][2429966080]{2:9343:274} Processing PE command id=324. Description: [Stat Resource : 0x7f270c145e00] 2020-07-17 18:50:52.410: [ CRSPE][2429966080]{2:9343:275} Processing PE command id=325. Description: [Stat Resource : 0x7f270c145e00] 2020-07-17 18:50:53.070: [ CRSPE][2429966080]{2:9343:276} Processing PE command id=326. Description: [Stat Resource : 0x7f270c145e00] 2020-07-17 18:51:35.517: [UiServer][2425763584] CS(0x7f270400a270)set Properties ( grid,0x7f273c0dac90) 2020-07-17 18:51:35.527: [UiServer][2427864832]{1:2032:53987} Sending message to PE. ctx= 0x7f270000ac30, Client PID: 9882 2020-07-17 18:51:35.528: [ CRSPE][2429966080]{1:2032:53987} Processing PE command id=119570. Description: [Stat Resource : 0x7f270c00d8a0] 2020-07-17 18:51:35.528: [ CRSPE][2429966080]{1:2032:53987} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == ze02db01)) 2020-07-17 18:51:35.529: [UiServer][2427864832]{1:2032:53987} Done for ctx=0x7f270000ac30
这个时候,进入SQLPLUS将实例关闭
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Mon Jul 20 16:13:46 2020
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SYS@orcl2> shutdown immediate
RAC的资源在节点1,可以正常提供服务
最终发现:
系统multipath -ll多路径软件不能读取共享磁盘
service multipathd restart
start_udev
多路径正常,重启节点2的crs、instance、nodeapp、listener 。RAC crs状态仍然不正常。出现lock问题
最终,通过重启节点2服务器, RAC正常