[原]说不清楚是Oracle的Bug还是TSM的Bug
说到controlfile的自动备份相信很多人都会第一时间反应出RMAN配置中的这一项:
RMAN> show all; RMAN configuration parameters are: ...... ...... CONFIGURE CONTROLFILE AUTOBACKUP OFF; ...... ......
但是在Oracle 10g环境中,当完成备份system表空间的时候,controlfile也会自动备一份,通常controlfile也不大,多备几次也无所谓。但是就是这个特性导致了TSM TDPO的一个报错。
RMAN的备份脚本如下:
run { configure exclude for tablespace PSTATS; configure exclude for tablespace WGXOA_SYS13; configure exclude for tablespace DZOA_SYS13; show exclude ; allocate channel ch1 device type 'sbt_tape' format '%d_%I_%M%D_%s.dbf' ; backup incremental level 0 database tag 'tsm'; backup current controlfile tag 'tsm'; configure exclude for tablespace PSTATS clear; configure exclude for tablespace WGXOA_SYS13 clear; configure exclude for tablespace DZOA_SYS13 clear; release channel ch1; }
RMAN的configure设置已经禁用了 controlfile 的autobackup。RMAN的备份日志如下:
RMAN> ### backup database everyday 2> set echo on; 3> run { 4> configure exclude for tablespace PSTATS; 5> configure exclude for tablespace WGXOA_SYS13; 6> configure exclude for tablespace DZOA_SYS13; 7> show exclude ; 8> allocate channel ch1 9> device type 'sbt_tape' 10> format '%d_%I_%M%D_%s.dbf' ; 11> backup incremental level 0 database tag 'tsm'; 12> backup current controlfile tag 'tsm'; 13> configure exclude for tablespace PSTATS clear; 14> configure exclude for tablespace WGXOA_SYS13 clear; 15> configure exclude for tablespace DZOA_SYS13 clear; 16> release channel ch1; 17> } 18> exit; echo set on tablespace PSTATS will be excluded from future whole database backups new RMAN configuration parameters are successfully stored starting full resync of recovery catalog full resync complete tablespace WGXOA_SYS13 will be excluded from future whole database backups new RMAN configuration parameters are successfully stored starting full resync of recovery catalog full resync complete tablespace DZOA_SYS13 will be excluded from future whole database backups new RMAN configuration parameters are successfully stored starting full resync of recovery catalog full resync complete RMAN configuration parameters are: CONFIGURE EXCLUDE FOR TABLESPACE 'PSTATS'; CONFIGURE EXCLUDE FOR TABLESPACE 'WGXOA_SYS13'; CONFIGURE EXCLUDE FOR TABLESPACE 'DZOA_SYS13'; allocated channel: ch1 channel ch1: sid=491 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-22 23:30:41 file 7 is excluded from whole database backup file 10 is excluded from whole database backup file 11 is excluded from whole database backup file 12 is excluded from whole database backup file 13 is excluded from whole database backup file 14 is excluded from whole database backup file 15 is excluded from whole database backup file 16 is excluded from whole database backup file 17 is excluded from whole database backup file 18 is excluded from whole database backup file 19 is excluded from whole database backup file 20 is excluded from whole database backup file 21 is excluded from whole database backup file 22 is excluded from whole database backup file 23 is excluded from whole database backup file 24 is excluded from whole database backup file 25 is excluded from whole database backup file 26 is excluded from whole database backup file 27 is excluded from whole database backup file 28 is excluded from whole database backup file 29 is excluded from whole database backup file 30 is excluded from whole database backup file 32 is excluded from whole database backup file 33 is excluded from whole database backup file 34 is excluded from whole database backup file 35 is excluded from whole database backup channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00005 name=/u01/app/oradata/oracle/LSOA_SYS13_01.dbf input datafile fno=00009 name=/u01/app/oradata/oracle/HCBOA_SYS13.dbf input datafile fno=00002 name=/u01/app/oradata/oracle/undotbs101.dbf input datafile fno=00031 name=/u01/app/oradata/oracle/ZC_USERS02.dbf input datafile fno=00006 name=/u01/app/oradata/oracle/ZC_USERS01.dbf input datafile fno=00001 name=/u01/app/oradata/oracle/system01.dbf input datafile fno=00003 name=/u01/app/oradata/oracle/sysaux01.dbf input datafile fno=00008 name=/u01/app/oradata/oracle/ZC_INDX01.dbf input datafile fno=00004 name=/u01/app/oradata/oracle/users01.dbf channel ch1: starting piece 1 at 2011-02-22 23:30:42 channel ch1: finished piece 1 at 2011-02-22 23:39:57 piece handle=ORACLE_1541093227_0222_348.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:09:15 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-22 23:39:59 channel ch1: finished piece 1 at 2011-02-22 23:40:14 piece handle=ORACLE_1541093227_0222_349.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:17 Finished backup at 2011-02-22 23:40:14 Starting backup at 2011-02-22 23:40:15 channel ch1: starting full datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset channel ch1: starting piece 1 at 2011-02-22 23:40:15 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/22/2011 23:40:30 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error Recovery Manager complete.
TSM服务器的日志如下:
02/22/2011 23:40:22 ANR0511I Session 21019 opened output volume E:\TSMPOOL\000034D4.BFS. (SESSION: 21019) 02/22/2011 23:40:24 ANR8340I SERVER volume ZAIBEI3_SERVER1.BFS.298389221 mounted. (SESSION: 21019) 02/22/2011 23:40:24 ANR1340I Scratch volume ZAIBEI3_SERVER1.BFS.298389221 is now defined in storage pool COPY_POOL. (SESSION: 21019) 02/22/2011 23:40:24 ANR0511I Session 21019 opened output volume ZAIBEI3_SERVER1.BFS.298389221. (SESSION: 21019) 02/22/2011 23:40:24 ANR4383E Session failure, target server ZAIBEI3_SERVER1 has aborted current transaction; reason: Exceeded MAXNUMMP on target. (SESSION: 21019) 02/22/2011 23:40:24 ANR1411W Access mode for volume ZAIBEI3_SERVER1.BFS.298389221 now set to "read-only" due to write error. (SESSION: 21019) 02/22/2011 23:40:24 ANR0514I Session 21019 closed volume ZAIBEI3_SERVER1.BFS.298389221. (SESSION: 21019) 02/22/2011 23:40:24 ANR1181E astxn.c(684): Data storage transaction 0:6735375 was aborted. (SESSION: 21019) 02/22/2011 23:40:24 ANR0532W smnode.c(3191): Transaction 0:6735375 was aborted for session 21019 for node ORA_110_7 (TDPO LinuxAMD64). (SESSION: 21019) 02/22/2011 23:40:24 ANE4994S (Session: 21019, Node: ORA_110_7) TDPO LinuxAMD64 ANU0599 TDP for Oracle: (9738): => (ora_110_7) ANS1301E (RC1) Server detected system error (SESSION: 21019) 02/22/2011 23:40:24 ANR0514I Session 21019 closed volume E:\TSMPOOL\000034D4.BFS. (SESSION: 21019) 02/22/2011 23:40:24 ANR0403I Session 21019 ended for node ORA_110_7 (TDPO LinuxAMD64). (SESSION: 21019)
从日志来看是参数MAXNUMMP 配置过小,或者空间满导致读写错误引发RMAN报错,实际上MAXNUMMP 的设置是完全够用,空间也很充足。
再仔细分析,从RMAN脚本中可以看到只有两个backup语句,从channel的配置来看最多就产生两个backup set,仔细查看日志可发现:
channel ch1: starting piece 1 at 2011-02-22 23:30:42 channel ch1: finished piece 1 at 2011-02-22 23:39:57 piece handle=ORACLE_1541093227_0222_348.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: starting piece 1 at 2011-02-22 23:39:59 channel ch1: finished piece 1 at 2011-02-22 23:40:14 piece handle=ORACLE_1541093227_0222_349.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0
这两句是我们期望的,但是偏偏不知道在哪里多了个backup操作:
Starting backup at 2011-02-22 23:40:15 channel ch1: starting full datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset channel ch1: starting piece 1 at 2011-02-22 23:40:15 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/22/2011 23:40:30 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error
从日志信息来看,就是备份controlfile和spfile,但是明明已经禁用了controlfile 的autobackup。这是为什么呢?
在《RMAN recipes for Oracle database 11g: a problem-solution approach》中有这么一句:
但是Oracle的官方文档中这样写道:
The RMAN behavior when theBACKUP
command includes datafile1
depends on theCONFIGURE
CONTROLFILE
AUTOBACKUP
setting. If control file autobackups are ON and the backup includes datafile 1, RMAN writes the control file and SPFILE to a separate autobackup backup set. If control file autobackups are OFF and the backup includes datafile 1, then RMAN includes the current control file and SPFILE in the same backup set as the datafiles.
简单来说,就是datafile 1(通常是system表空间)如果被backup,controlfile和spfile都会被备份,controlfile autobackup的设定仅仅是控制备份到哪里。
这就解析了controlfile和spfile为什么会“自动”备份了。
以下是RMAN的基本配置:
RMAN> show all ; RMAN configuration parameters are: CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 10 DAYS; CONFIGURE BACKUP OPTIMIZATION OFF; # default CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default CONFIGURE CONTROLFILE AUTOBACKUP OFF; CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/u01/ocfs_arch_ocr/backup/db_192.168.110.5_%d_%T_%F.ctrl'; CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE 'SBT_TAPE' TO 'db_RAC1_%d_%F.ctrl'; CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default CONFIGURE MAXSETSIZE TO UNLIMITED; # default CONFIGURE ENCRYPTION FOR DATABASE OFF; # default CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/dbs/snapcf_oracle1.f'; # default
仅备份system表空间,触发controlfile autobackup:
run { allocate channel ch1 device type 'sbt_tape' format '%d_%I_%M%D_%s.dbf' ; backup incremental level 0 tablespace system tag 'tsm'; release channel ch1 ; }
allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:25:23 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:25:23 channel ch1: finished piece 1 at 2011-02-23 17:25:48 piece handle=ORACLE_1557493236_0223_1269.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:25 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:25:50 channel ch1: finished piece 1 at 2011-02-23 17:26:05 piece handle=ORACLE_1557493236_0223_1270.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:17 Finished backup at 2011-02-23 17:26:05 released channel: ch1
分别备份两个表空间,其中一个是system表空间:
run { allocate channel ch1 device type 'sbt_tape' format '%d_%I_%M%D_%s.dbf' ; backup incremental level 0 tablespace system tag 'tsm'; backup incremental level 0 tablespace users tag 'tsm'; release channel ch1 ; }
allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:30:09 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:30:10 channel ch1: finished piece 1 at 2011-02-23 17:30:35 piece handle=ORACLE_1557493236_0223_1271.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:25 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:30:36 channel ch1: finished piece 1 at 2011-02-23 17:30:51 piece handle=ORACLE_1557493236_0223_1272.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:16 Finished backup at 2011-02-23 17:30:51 Starting backup at 2011-02-23 17:30:52 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00005 name=+DATA/oracle/users01.dbf channel ch1: starting piece 1 at 2011-02-23 17:30:52 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/23/2011 17:31:07 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error
使用include current controlfile 子句强制备份controlfile:
run { allocate channel ch1 device type 'sbt_tape' format '%d_%I_%M%D_%s.dbf' ; backup incremental level 0 tablespace system include current controlfile tag 'tsm'; release channel ch1 ; }
allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:32:57 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:32:57 channel ch1: finished piece 1 at 2011-02-23 17:33:22 piece handle=ORACLE_1557493236_0223_1274.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:25 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:33:24 channel ch1: finished piece 1 at 2011-02-23 17:33:39 piece handle=ORACLE_1557493236_0223_1275.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:17 Finished backup at 2011-02-23 17:33:39 released channel: ch1
基本上没有人用的写法:
run { allocate channel ch1 device type 'sbt_tape' format '%d_%I_%M%D_%s.dbf' ; backup incremental level 0 tablespace system include current controlfile tag 'tsm'; backup incremental level 0 tablespace users tag 'tsm'; release channel ch1 ; }
allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:34:09 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:34:10 RMAN-03009: failure of backup command on ch1 channel at 02/23/2011 17:34:35 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error continuing other job steps, job failed will not be re-run channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:34:36 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/23/2011 17:34:51 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error
一种比较正常的写法:
run { allocate channel ch1 device type 'sbt_tape' format '%d_%I_%M%D_%s.dbf' ; backup incremental level 0 tablespace users tag 'tsm'; backup incremental level 0 tablespace system include current controlfile tag 'tsm'; release channel ch1 ; }
ANS0102W Unable to open the message repository /opt/tivoli/tsm/client/oracle/bin64/zh_CN/tdpo.cat. The American English repository will be used instead. allocated channel: ch1 channel ch1: sid=447 instance=oracle1 devtype=SBT_TAPE channel ch1: Data Protection for Oracle: version 5.4.1.0 Starting backup at 2011-02-23 17:38:40 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00005 name=+DATA/oracle/users01.dbf channel ch1: starting piece 1 at 2011-02-23 17:38:41 channel ch1: finished piece 1 at 2011-02-23 17:38:56 piece handle=ORACLE_1557493236_0223_1278.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:15 Finished backup at 2011-02-23 17:38:56 Starting backup at 2011-02-23 17:38:57 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ch1: starting piece 1 at 2011-02-23 17:38:57 channel ch1: finished piece 1 at 2011-02-23 17:39:22 piece handle=ORACLE_1557493236_0223_1279.dbf tag=TSM comment=API Version 2.0,MMS Version 5.4.1.0 channel ch1: backup set complete, elapsed time: 00:00:25 channel ch1: starting incremental level 0 datafile backupset channel ch1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ch1: starting piece 1 at 2011-02-23 17:39:24 released channel: ch1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on ch1 channel at 02/23/2011 17:39:39 ORA-27192: skgfcls: sbtclose2 returned error - failed to close file ORA-19511: Error received from media manager layer, error text: ANS1301E (RC1) Server detected system error
如果不使用TSM进行备份,是否会报错呢?
run { backup incremental level 0 tablespace system include current controlfile format '/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_%d_%T_%U.dbf' tag 'tsm'; backup incremental level 0 tablespace users format '/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_%d_%T_%U.dbf' tag 'tsm'; }
Starting backup at 2011-02-23 17:42:40 using channel ORA_DISK_1 channel ORA_DISK_1: starting incremental level 0 datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00001 name=+DATA/oracle/system01.dbf channel ORA_DISK_1: starting piece 1 at 2011-02-23 17:42:40 channel ORA_DISK_1: finished piece 1 at 2011-02-23 17:42:47 piece handle=/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_ORACLE_20110223_82m5dekg_1_1.dbf tag=TSM comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07 channel ORA_DISK_1: starting incremental level 0 datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset including current control file in backupset including current SPFILE in backupset channel ORA_DISK_1: starting piece 1 at 2011-02-23 17:42:49 channel ORA_DISK_1: finished piece 1 at 2011-02-23 17:42:50 piece handle=/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_ORACLE_20110223_83m5dekn_1_1.dbf tag=TSM comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03 Finished backup at 2011-02-23 17:42:50 Starting backup at 2011-02-23 17:42:50 using channel ORA_DISK_1 channel ORA_DISK_1: starting incremental level 0 datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00005 name=+DATA/oracle/users01.dbf channel ORA_DISK_1: starting piece 1 at 2011-02-23 17:42:51 channel ORA_DISK_1: finished piece 1 at 2011-02-23 17:42:52 piece handle=/u01/ocfs_arch_ocr/backup/db_full_192.168.110.5_ORACLE_20110223_84m5dekr_1_1.dbf tag=TSM comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 Finished backup at 2011-02-23 17:42:52
从测试来看,run代码块中出现两个backup,TSM就变傻了。