使用RMAN Duplicate搭建DG,备库启动时报ORA-19838
1、故障概要
客户使用duplicate搭建DataGuard时,遭遇ORA-19838错误,备库无法mount,具体报错信息如下所示。
2、故障分析
(1). 与客户进行电话沟通,了解整个故障的过程:客户先在主库上进行RMAN备份,然后将备份集传输至备库,最后使用duplicate target database for standby nofilenamecheck dorecover;方式搭建DataGuard,在备库执行duplicate命令时出现了一些错误,然后客户将备库关闭,并重新启动。在重新启动备库的过程中,出现ORA-19838错误,提示不允许使用当前的控制文件挂载数据库。
(2). 分析主库的RMAN备份脚本及备份日志,备份脚本中的核心代码如下所示。
run{ allocate channel c1 device type disk; allocate channel c2 device type disk; allocate channel c3 device type disk; allocate channel c4 device type disk; allocate channel c5 device type disk; allocate channel c6 device type disk; backup as compressed backupset database format '/rman_backup/jy01cdb/JY01CDB_full_%U_%D_%T'; backup archivelog all format '/rman_backup/jy01cdb/JY01CDB_archive_%U_%D_%T'; backup current controlfile for standby format '/rman_backup/jy01cdb/jy01cdb_stb.bkp'; release channel c1; release channel c2; release channel c3; release channel c4; release channel c5; release channel c6; } |
可以看出,备份脚本没有任何问题,相关的备份日志也没有任何异常。
(3). 分析备库的RMAN duplicate脚本,RMAN duplicate的核心代码如下所示。
duplicate target database for standby nofilenamecheck dorecover; |
这里需要注意,脚本中duplicate时使用了dorecover选项,这意味着当备库数据文件创建完成后,还需要recover备库。
(4). 分析备库的RMAN duplicate命令生成的日志。
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of Duplicate Db command at 11/22/2023 04:05:10 RMAN-05501: aborting duplication of target database RMAN-03015: error occurred in stored script Memory Script RMAN-06053: unable to perform media recovery because of missing log RMAN-06025: no backup of archived log for thread 3 with sequence 9989 and starting SCN of 95353392895 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9988 and starting SCN of 95351547101 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9987 and starting SCN of 95346173344 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9986 and starting SCN of 95334857890 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9985 and starting SCN of 95326166620 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9984 and starting SCN of 95320724365 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9983 and starting SCN of 95315701652 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9982 and starting SCN of 95313626415 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9981 and starting SCN of 95307222432 found to restore RMAN-06025: no backup of archived log for thread 3 with sequence 9980 and starting SCN of 95302824195 found to restore ...... RMAN-06025: no backup of archived log for thread 1 with sequence 11649 and starting SCN of 95225202077 found to restore RMAN-00567: Recovery Manager could not print some error messages
RMAN> |
可以看出,duplicate命令最终报错了,因为在执行media recovery时,缺少归档日志。
(5). 从duplicate命令生成的日志来看,所有的数据文件已经在备库restore了,只是在recover时,缺少归档日志,后续只要将缺少的归档日志补齐,备库是可以追平主库的。但现在的问题是如何处理备库的控制文件。这个其实也简单,只需要在主库中再生成一个standby的控制文件备份,然后将这个新的控制文件备份还原至备库,备库理论上就应该可以重新mount上。 但此时备库控制文件中记录的数据文件路径肯定与当前的备库不一致,所以还需要进行rename操作,修改控制文件中数据文件的路径。
(6). 在MOS库中搜到Mounting Standby Database After RMAN Duplicate Failure Returns Error ORA-19838 (Doc ID 2452298.1),才发现这个故障竟然是个BUG,这个BUG在11g以后的版本中都存在。Workaround就是上面的处理思路,具体可参考Step By Step Guide On How To Recreate Standby Control File When Datafiles Are On ASM And Using Oracle Managed Files (Doc ID 734862.1)。
3、建议
使用duplicate命令搭建DataGuard时,最好不要加dorecover选项。