Oracle19C ADG主备切换问题处理
--原文:
https://www.modb.pro/db/40938?xzs=
大家好!最近遇到一起ADG主备切换后实时应用开启失败的故障,导致故障的原因很简单,关键的还是分析思路,具体情况请听笔者一一道来。
环境:
操作系统:LINUX
数据库版本:19.7
是否RAC:是
当前主库:一中心,切换后一中心变成备库
当前备库:二中心,切换后一中心变成主库
我们同时做了十多套ADG的切换测试,只有这一套库切换出现异常。主库第一次从一中心切换到二中心后,一中心切换到STANDBY角色并startup后,打开实时应用显示成功执行:
SQL>startup
ORACLEinstance started.
TotalSystem Global Area 3.8762E+11 bytes
FixedSize 30951712 bytes
VariableSize 1.1254E+11 bytes
DatabaseBuffers 2.7488E+11 bytes
RedoBuffers 170373120 bytes
Databasemounted.
Databaseopened.
SQL>
SQL>alter database recover managed standby database using current logfiledisconnect from session;
Databasealtered.
SQL>
SQL>
SQL>set line 300 pagesize 5000
SQL>selectname,db_unique_name,open_mode,database_role,SWITCHOVER_STATUS,protection_mode,flashback_onfrom gv$database;
NAME DB_UNIQUE_NAME OPEN_MODE DATABASE_ROLE SWITCHOVER_STATUS PROTECTION_MODE FLASHBACK_ON
--------------------------------------- -------------------- ------------------------------------ -------------------- ------------------
RACDB RACDB READ ONLY WITH APPLY PHYSICAL STANDBYNOT ALLOWED MAXIMUM PERFORMANCE NO
RACDB RACDB READ ONLY WITH APPLY PHYSICAL STANDBYNOT ALLOWED MAXIMUM PERFORMANCE NO
此时显示备库正常。
1、查看新备库dbalert日志确认切换成功:
2020-09-01T15:43:16.093670+08:00
NET (PID:25084): Switchover complete. Database shutdown required
USER(ospid: 25084): terminating the instance
2020-09-01T15:43:16.764327+08:00
ORA-1092: opitsk aborting process
2020-09-01T15:43:17.112672+08:00
Instanceterminated by USER, pid = 25084
2020-09-01T15:43:17.208529+08:00
ORA-1092: opitsk aborting process
TMI:dbsdrv switchover to target END 2020-09-01 15:43:17.239806
Completed:alter database switchover to racdbstd
Shuttingdown ORACLE instance (abort) (OS id: 25084)
Shutdownis initiated by sqlplus@racdb1 (TNS V1-V3).
Licensehigh water mark = 42
2020-09-01T15:43:17.272253+08:00
Warning:2 processes are still attacheded to shmid 1802268:
(size:1019904 bytes, creator pid: 143048, last attach/detach pid: 25084)
Instanceshutdown complete (OS id: 25084)
2、然后同事在新主库创建数据文件测试是否同步至备库时,查看alert日志发现实时应用失败,于是决定使用重启大法,重新打开新备库时报错:
racdb1:/home/oracle(racdb1)$sqlplus/ as sysdba
SQL*Plus:Release 19.0.0.0.0 - Production on Tue Sep 1 17:35:35 2020
Version19.7.0.0.0
Copyright(c) 1982, 2020, Oracle. All rights reserved.
Connectedto an idle instance.
SQL>startup
ORACLEinstance started.
TotalSystem Global Area 3.8762E+11 bytes
FixedSize 30951712 bytes
VariableSize 1.1254E+11 bytes
DatabaseBuffers 2.7488E+11 bytes
RedoBuffers 170373120 bytes
Databasemounted.
ORA-10458:standby database requires recovery
ORA-01196:file 1 is inconsistent due to a failed media recovery session
ORA-01110:data file 1: '+DATADG1/RACDB/DATAFILE/system.260.1048091377'
3、此时笔者介入,通过了解情况发现是在新主库添加数据文件进行测试之后出现的异常。于是查看添加数据文件时间点的dbalert日志发现控制文件自动快照备份失败,数据文件同步失败导致实时应用终止。
2020-09-01T16:17:26.492303+08:00
Controlfile backup creation failed:
failureto open backup target file/oracle/app/oracle/product/19.0.0/db/dbs/snapcf_racdb2.f.
2020-09-01T16:17:26.493708+08:00
Errorsin file/oraclelog/diag/rdbms/racdb/racdb1/trace/racdb1_pr00_92022.trc:
ORA-27037:unable to obtain file status
Linux-x86_64Error: 2: No such file or directory
Additionalinformation: 7
2020-09-01T16:22:05.961404+08:00
Controlautobackup written to DISK device
handle'/oracle/app/oracle/product/19.0.0/db/dbs/c-2886113901-20200901-00'
2020-09-01T16:24:57.733483+08:00
PR00(PID:92022): MRP0: Background Media Recovery terminated with error1193
2020-09-01T16:24:57.733624+08:00
Errorsin file/oraclelog/diag/rdbms/racdb/racdb1/trace/racdb1_pr00_92022.trc:
ORA-01193:file 26 is not the same file seen at start of recovery
ORA-01110:data file 26:'+DATADG1/RACDB/AC86ED673411EB76E0534C36E60A63A8/DATAFILE/tbs_daoshu_data.334.1048267329'
2020-09-01T16:24:57.735016+08:00
....(PID:113683): Managed Standby Recovery not using Real Time Apply
2020-09-01T16:24:57.920384+08:00
Recoveryinterrupted!
4、继续查看再次重启时的dbalert日志,发现报ORA-600错误:
2020-09-01T16:49:10.804524+08:00
Errorsin file/oraclelog/diag/rdbms/racdb/racdb1/trace/racdb1_pr00_43860.trc (incident=1153647) (PDBNAME=CDB$ROOT):
ORA-00600:internal error code, arguments: [krdrsb_broadcast_influx_scn_1],[15254371943880], [15254373230828], [], [], [], [], [], [], [], [],[]
(3):Incidentdetails in:/oraclelog/diag/rdbms/racdb/racdb1/incident/incdir_1153647/racdb1_pr00_43860_i1153647.trc
(3):UseADRCI or Support Workbench to package the incident.
SeeNote 411.1 at My Oracle Support for error and packaging details.
2020-09-01T16:49:12.527738+08:00
Errorswith log+ARCHIVEDG/RACDB/ARCHIVELOG/2020_09_01/thread_2_seq_55.414.1049990539
PR00(PID:43860): MRP0: Background Media Recovery terminated with error600
2020-09-01T16:49:12.527955+08:00
Errorsin file/oraclelog/diag/rdbms/racdb/racdb1/trace/racdb1_pr00_43860.trc:
ORA-00600:internal error code, arguments: [krdrsb_broadcast_influx_scn_1],[15254371943880], [15254373230828], [], [], [], [], [], [], [], [],[]
2020-09-01T16:49:12.529640+08:00
....(PID:113683): Managed Standby Recovery not using Real Time Apply
2020-09-01T16:49:12.659315+08:00
Recoveryinterrupted!
5、查看RMAN备份策略发现控制文件自动快照备份是本地目录RMAN>show all;using target database control file instead ofrecovery catalogRMAN configuration parameters for database withdb_unique_name RACDBSTD are:CONFIGURE RETENTION POLICY TOREDUNDANCY 1; # defaultCONFIGURE BACKUP OPTIMIZATION OFF; #defaultCONFIGURE DEFAULT DEVICE TYPE TO DISK; #defaultCONFIGURE CONTROLFILE AUTOBACKUP ON; # defaultCONFIGURECONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; #defaultCONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TOBACKUPSET; # defaultCONFIGURE DATAFILE BACKUP COPIES FOR DEVICETYPE DISK TO 1; # defaultCONFIGURE ARCHIVELOG BACKUP COPIES FORDEVICE TYPE DISK TO 1; # defaultCONFIGURE MAXSETSIZE TOUNLIMITED; # defaultCONFIGURE ENCRYPTION FOR DATABASE OFF; #defaultCONFIGURE ENCRYPTION ALGORITHM 'AES128'; #defaultCONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # defaultCONFIGURE RMANOUTPUT TO KEEP FOR 7 DAYS; # defaultCONFIGURE ARCHIVELOGDELETION POLICY TO APPLIED ON ALL STANDBY;CONFIGURESNAPSHOT CONTROLFILE NAME TO'/oracle/app/oracle/product/19.0.0/db/dbs/snapcf_racdb1.f'; #default
原因分析:
是由于切换过程中控制文件有更新,恰好此时oracle触发了一次控制文件自动快照备份,由于快照备份路径设置在本地目录,自动备份失败,导致MRP0进程马上终止,介质恢复失败,从而主备不同步。
解决方法:
修改RMAN配置参数CONFIGURESNAPSHOT CONTROLFILENAME,将控制文件SNAPSHOT放在共享存储上。目前为了防止其他ADG发生类似的问题,已将所有ADG的控制文件SNAPSHOT存放到共享存储。
RMAN>CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+DATADG1/snapcf_racdb1.f';
疑问:
其他数据库的CONFIGURESNAPSHOT CONTROLFILE NAME也没有配置到共享存储,建表空间测试步骤都是相同的,如果说是因为CONFIGURESNAPSHOT CONTROLFILE NAME没有配置到共享存储引起,为什么只有这套数据库出现不同步的问题,其他数据库都没有出现?目前从日志来看只有这套库当时触发了控制文件自动快照备份,而其他库没有触发,所以正常。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· DeepSeek 开源周回顾「GitHub 热点速览」
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了