[oracle 工程师手记]RMAN duplicate 发生ORA-19504、ORA-17502、ORA-15001、ORA-27140 错误的解决过程
客户报告,用RMAN 的 duplicate 命令,在具备RAC环境的standby 端,创建standby 数据库时,失败。
报:ORA-19504、ORA-17502、ORA-15001、ORA-27140
执行的过程如下:
[oracle @ racddb001g ~] $ export ORACLE_SID = tmt011 [oracle @ racddb001g ~] $ [oracle @ racddb001g ~] $ sqlplus/as sysdba SQL * Plus: Release 12.2.0.1.0 Production on Fri April 4 02:26:18 2021 Copyright (c) 1982, 2016, Oracle. All rights reserved. Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 --64bit Production Connected to. SQL> shutdown immediate ORA-01507: database is not mounted The ORACLE instance has been shut down. SQL> startup nomount pfile ='/media/dg/stby_inittmt01.ora' The ORACLE instance has started. Total System Global Area 1593835520 bytes Fixed Size 8421136 bytes Variable Size 453985776 bytes Database Buffers 848860800 bytes Redo Buffers 294367808 bytes SQL> exit [oracle@racddb001g ~]$ export NLS_DATE_FORMAT='yyyy/mm/dd hh24:mi:ss' [oracle@racddb001g ~]$ rman target 'sys@tmt01H' auxiliary / RMAN> duplicate target database for standby dorecover nofilenamecheck; Duplicate Db started at 2021/04/04 02:28:29 Channel: ORA_AUX_DISK_1 assigned Channel ORA_AUX_DISK_1: SID = 30 Instance = tmt011 Device Type = DISK The current log has been archived. Memory script content: { set until scn 5645034; restore clone standby controlfile; } Running a memory script Execution command: SET until clause restore is starting at 2021/04/02 02:28:40 Use of channel ORA_AUX_DISK_1 Channel ORA_AUX_DISK_1: Restoring control file ORA-19504: failed to create file "+DG001/tmt01d/CONTROLFILE/control01.ctl". ORA-17502: failed to create ksfdcre:3 file +DG001/tmt01d/CONTROLFILE/control01.ctl ORA-15001: diskgroup "DG001" does not exist or is not mounted ORA-27140: attach to post/wait facility failed Failover to previous backup Channel ORA_AUX_DISK_1: Restoring control file ORA-19504: failed to create file "+DG001/tmt01d/CONTROLFILE/control01.ctl". ORA-17502: failed to create ksfdcre: 3 file +DG001/tmt01d/CONTROLFILE/control01.ctl ORA-15001: diskgroup "DG001" does not exist or is not mounted ORA-27140: attach to post/wait facility failed Failover to previous backup Channel ORA_AUX_DISK_1: Starting restore of datafile backup set Channel ORA_AUX_DISK_1: Restoring control file Channel ORA_AUX_DISK_1: reading from backup piece /my/oracle/dbhome_1/dbs/c-2060537070-20210326-01 Channel ORA_AUX_DISK_1: ORA-19870: error restoring backup piece /my/oracle/dbhome_1/dbs/c-2060537070-20210326-01 ORA-19504: failed to create file "+DG001/tmt01d/CONTROLFILE/control01.ctl". ORA-17502: failed to create ksfdcre: 3 file +DG001/tmt01d/CONTROLFILE/control01.ctl Failover to previous backup Channel ORA_AUX_DISK_1: Starting restore of datafile backup set Channel ORA_AUX_DISK_1: Restoring control file Channel ORA_AUX_DISK_1: reading from backup piece /my/oracle/dbhome_1/dbs/c-2060827010-20210401-00 Channel ORA_AUX_DISK_1: ORA-19870: error during restore of backup piece /my/oracle/dbhome_1/dbs/c-2060827010-20210401-00 ORA-19504: failed to create file "+DG001/tmt01d/CONTROLFILE/control01.ctl". ORA-17502: failed to create ksfdcre: 3 file +DG001/tmt01d/CONTROLFILE/control01.ctl Failover to previous backup Channel ORA_AUX_DISK_1: Starting restore of datafile backup set Channel ORA_AUX_DISK_1: Restoring control file Channel ORA_AUX_DISK_1: reading from backup piece/media/dg/backup_db_07stlrd3_1_1 Channel ORA_AUX_DISK_1: ORA-19870: error restoring backup piece/media/dg/backup_db_07stlrd3_1_1 ORA-19504: failed to create file "+DG001/tmt01d/CONTROLFILE/control01.ctl". ORA-17502: failed to create ksfdcre: 3 file +DG001/tmt01d/CONTROLFILE/control01.ctl Failover to previous backup Channel ORA_AUX_DISK_1: Starting restore of datafile backup set Channel ORA_AUX_DISK_1: Restoring control file Channel ORA_AUX_DISK_1: reading from backup piece/media/dg/backup_db_07stlrd3_1_1 Channel ORA_AUX_DISK_1: ORA-19870: error restoring backup piece/media/dg/backup_db_07stlrd3_1_1 ORA-19504: failed to create file "+DG001/tmt01d/CONTROLFILE/control01.ctl". ORA-17502: failed to create ksfdcre: 3 file +DG001/tmt01d/CONTROLFILE/control01.ctl Failover to previous backup Channel ORA_AUX_DISK_1: Starting restore of datafile backup set Channel ORA_AUX_DISK_1: Restoring control file Channel ORA_AUX_DISK_1: reading from backup piece /my/oracle/dbhome_1/dbs/c-2060537070-20210320-00 Channel ORA_AUX_DISK_1: ORA-19870: error during restore of backup piece /my/oracle/dbhome_1/dbs/c-2060537070-20210320-00 ORA-19505: failed to identify file "/my/oracle/dbhome_1/dbs/c-2060537070-20210320-00". ORA-27037: unable to get file status Failover to previous backup RMAN-00571: ============================================== ============= RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: ============================================== ============= RMAN-03002: Duplicate Db command failed at 04/02/2021 02:45:39 RMAN-05501: abort copy of target database RMAN-03015: error in stored script Memory Script RMAN-06026: missing target--stop restore RMAN-06024: cannot find backup or copy to restore control file RMAN>
从上面的出错信息上看,这是duplicate过程创建辅助instance,向磁盘组 restore 控制文件失败了:
Channel ORA_AUX_DISK_1: Restoring control file ORA-19504: failed to create file "+DG001/tmt01d/CONTROLFILE/control01.ctl". ORA-17502: failed to create ksfdcre:3 file +DG001/tmt01d/CONTROLFILE/control01.ctl ORA-15001: diskgroup "DG001" does not exist or is not mounted ORA-27140: attach to post/wait facility failed Failover to previous backup
首先看看用户oracle 有没有对磁盘组的读写权限:
SQL> select NAME,STATE from v$asm_diskgroup; NAME ------------------ STATE ------------------ DG001 MOUNTED SQL> select name,PATH from v$asm_disk; NAME ------------------------------------- PATH ------------------------------------- DG001_0000 /dev/mapper/ora01
查看磁盘组对应的物理磁盘,发现映射到 /dev/dm-11 设备。
$ ls -l /dev/mapper/ora01 lrwxrwxrwx 1 root root 8 Mar 27 00:52 /dev/mapper/ora01 -> ../dm-11
该设备的 owner 和组,分别是 grid:asmadmin。
$ ls -l /dev/dm-* brw-rw---- 1 grid asmadmin 253, 10 Mar 27 00:58 /dev/dm-10 brw-rw---- 1 grid asmadmin 253, 11 Mar 27 00:58 /dev/dm-11
看一下 grid 用户的组:
# su - grid $ id uid=10000(grid) gid=11000(oinstall) groups=11000(oinstall),11002(asmadmin) ,11003(asmdba),11004(asmoper)
再看一下oracle 用户的组:
# su - oracle $ id uid=10001(oracle) gid=11000(oinstall) groups=11000(oinstall),11001(dba),11003(asmdba),11005(racdba),11006(backupdba),11007(dgdba),11008(kmdba),11009(oper)
发现 oracle 用户,根本不在 asmadmin 组里。可能是因为这个原因,没有办法访问数据库。
请客户把 oracle 用户也加入到 asmadmin 中,但是再次执行duplicate 没有什么变化:
usermod -a -G asmadmin oracle
现在,还需要考虑其它的原因,就是 oracle 可执行文件,是否有权限以 grid 用户身份运行。
<primary> # su - oracle $ ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x 1 oracle asmadmin 408674152 3月 19 00:22 /my/oracle/dbhome_1/bin/oracle [root@rachdb001g ~]# su - grid 最終ログイン: 2021/04/01 (木) 16:29:18 JST [grid@rachdb001g ~]$ ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x 1 grid oinstall 373409344 Mar 18 03:05 /opt/oracle/grid/12.2.0/grid/bin/oracle [grid@rachdb001g ~]$ <standby> #su - oracle $ ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x 1 oracle asmadmin 408674152 3月 24 22:19 /my/oracle/dbhome_1/bin/oracle # su - grid $ ls -l $ORACLE_HOME/bin/oracle -rwxr-x--x 1 grid oinstall 373409344 Mar 20 01:56 /opt/oracle/grid/12.2.0/grid/bin/oracle
可以看到,主库和备库上的 grid 用户的 $ORACLE_HOME/bin/oracle 的权限是不一样的。
一个是: -rwsr-x--x ,一个是 -rwxr-x--x。
需要进行设置:
chown grid:oinstall $GI_HOME/bin/oracle chmod 6751 $GI_HOME/bin/oracle
重新启动 standby 端的辅助instance,再次执行 duplicate, 已经可以成功执行。