由ORACLE RAC心跳异常引起的生产库故障
一、问题描述
环境描述:
节点 | sid | db_name | software_version | 备注 |
---|---|---|---|---|
172.16.2.22 | hdls1 | HDLS | 11.2.0.4 | rac节点 |
172.16.2.23 | hdls2 | HDLS | 11.2.0.4 | rac 节点 |
事件原因:
两个节点的心跳网络异常,导致RAC脑裂,中断了节点运行的oracle实列进程,数据库服务宕掉。
二、过程
2.1 时间:16:45报障处理
检查发现两台oracle实例进程中止,无法正常连接。
2.2 时间:17:25恢复23节点
恢复23节点,保证业务作业可正常进行,排查22节点故障。等待作业完成处理。
-
重启22节点后,23节点的数据服务恢复正常
reboot -f
-
检查23节点的数据库服务状态
crs_stat -t
2.3 对节点22进行分析
1、EVMD日志
2022-09-06 22:37:17.970: [GIPCHTHR][3844073216]gipchaWorkerCreateInterface: created remote interface for node 'hdls02', haName 'fe0a-b4a2-f838-ac00', inf 'udp://11.0.0.23:19879'
2022-09-06 22:37:17.970: [GIPCHGEN][3844073216]gipchaWorkerAttachInterface: Interface attached inf 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 0, flags 0x6 }
2022-09-06 22:37:17.970: [GIPCXCPT][3844073216]gipchaLowerRecv: message from unrecognized node 'udp://11.0.0.23:19879', hdr 0x7f21c002bf68 { len 80, seq 0, type gipchaHdrTypeAck (3), lastSeq 1, lastAck 0, minAck 2, flags 0x1, srcLuid 24d64699-7050de6f, dstLuid 6678805d-500d8712, msgId 1 }, ret gipcretFail (1)
2022-09-06 22:37:17.970: [GIPCHALO][3844073216]gipchaLowerCallback: EXCEPTION[ ret gipcretFail (1) ] error while processing req 0x7f21e51fbe60 { type gipcreqtypeRecv, endp 0000000000001950, ret gipcretSuccess, local 'udp://11.0.0.22:18417', peer 'udp://11.0.0.23:19879', buf 0x7f21c002bf68, len 10240, olen 80 }, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }
2022-09-06 22:37:17.971: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:17.971: [GIPCHALO][3844073216]gipchaLowerSend: deffering startup of hdr 0x7f21c001f2d8 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAck 0, minAck 0, flags 0x0, srcLuid 00000000-