【OGG 故障处理】 丢失归档恢复
2019-05-30 16:50 那个,我 阅读(2097) 评论(0) 编辑 收藏 举报OGG 有两天由于某种原因没有启动,而这段时间的备份文件缺失了一部分归档。恢复过程记录如下:
GGSCI (xxxx) 6> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING EXTRACT RUNNING DP_XXXX3 00:00:00 00:00:08 EXTRACT RUNNING DP_XXXX4 00:00:00 00:00:05 EXTRACT ABENDED EX_XXXX3 00:00:01 65:09:29 EXTRACT ABENDED EX_XXXX4 00:00:01 65:09:35 GGSCI (xxxx) 6> view report EX_XXXX1 2019-05-27 14:58:34 ERROR OGG-00446 Could not find archived log for sequence 76437 thread 1 under default destinations SQL <SELECT name FROM v$archived_log WHERE sequence# = :ora_seq_no AND thread# = :ora_thread AND resetlogs_id = :ora_resetlog_id AND archived = 'YES' AND deleted = 'NO' AND standby_dest = 'NO' order by name DESC>, error retrievin g redo file name for sequence 76437, archived = 1, use_alternate = 0 Not able to establish initial position for sequence 76437, rba 6905872. 2019-05-27 14:58:34 ERROR OGG-01668 PROCESS ABENDING. 检查确认: set line 200 pages 200 col name for a100 SELECT sequence#,name,thread#,resetlogs_id,archived,deleted FROM v$archived_log WHERE sequence# >= 76437; SEQUENCE# NAME THREAD# RESETLOGS_ID ARC DEL ---------- ---------------------------------------------------------------------------------------------------- ---------- ------------ --- --- 76437 1 898097516 YES YES 76438 1 898097516 YES YES 76439 1 898097516 YES YES 76440 1 898097516 YES YES 76441 1 898097516 YES YES 76442 1 898097516 YES YES 76443 1 898097516 YES YES 76444 1 898097516 YES YES 76445 1 898097516 YES YES 76446 1 898097516 YES YES 76447 1 898097516 YES YES 76448 1 898097516 YES YES 76449 1 898097516 YES YES 76450 1 898097516 YES YES 76451 1 898097516 YES YES 76452 1 898097516 YES YES 76453 1 898097516 YES YES 76454 1 898097516 YES YES 76455 1 898097516 YES YES 76456 1 898097516 YES YES 76457 1 898097516 YES YES 76458 1 898097516 YES YES 76459 1 898097516 YES YES 76460 1 898097516 YES YES 76461 1 898097516 YES YES 76462 1 898097516 YES YES 76463 1 898097516 YES YES 76464 1 898097516 YES YES 76465 1 898097516 YES YES 29 rows selected.
看不到归档路径,DELETED='YES' ,可见确实归档丢失。
如果归档没有丢失,将归档恢复到原来位置,重新启动抽取进程,即可完成恢复。大概流程如下:
--1.检查备份归档是否存在 rman target / RMAN > list backup of archivelog sequence between 76437 and 76501; --2.恢复归档 RMAN > restore archivelog from sequence 76437; --3.确认归档恢复到了默认归档位置 --4.重新启动抽取进程 ./ggsci start ex_xxxx3 info all stats ex_xxxx3 stats dp_xxxx3 比对数据,确认恢复
再次确认后,RAC 两个节点的归档都有丢失。
thread2: 52036 到 52041断档
thread1: 76495 到 76501断档
其中
进程3 对应的表只有insert操作,我们采用如下方式处理的:
比对数据将这段时间的数据插入进去(可以采用tns/exp where/plsql where/expdp where等方式),然后跳过这段时间进行同步
进程4 包含了更新插入删除操作,只能通过重新初始化的方式恢复
进程3 补数据恢复
--1.获取断档故障时间范围 SQL> alter session set nls_date_format='YYYYMMDD HH24:MI:ss'; Session altered. SQL> select THREAD#,SEQUENCE#,FIRST_TIME,NEXT_TIME from v$archived_log where SEQUENCE# between 52036 and 52041; SEQUENCE# FIRST_TIME NEXT_TIME ---------- ----------------- ----------------- 52036 20190526 20:05:42 20190526 21:05:41 52037 20190526 21:05:41 20190526 21:34:56 52038 20190526 21:34:56 20190526 22:34:54 52039 20190526 22:34:54 20190526 23:34:54 52040 20190526 23:34:54 20190527 00:34:55 52041 20190527 00:34:55 20190527 01:34:55 6 rows selected. SQL> select THREAD#,SEQUENCE#,FIRST_TIME,NEXT_TIME from v$archived_log where SEQUENCE# between 76495 and 76501; THREAD# SEQUENCE# FIRST_TIME NEXT_TIME ---------- ---------- ----------------- ----------------- 1 76495 20190526 20:05:29 20190526 20:05:42 1 76496 20190526 20:05:42 20190526 20:56:07 1 76497 20190526 20:56:07 20190526 21:34:54 1 76498 20190526 21:34:54 20190526 22:24:17 1 76499 20190526 22:24:17 20190526 23:24:15 1 76500 20190526 23:24:15 20190527 00:24:15 1 76501 20190527 00:24:15 20190527 01:24:17 --2.将这段时间数据插入过去 insert into target_table select * from source_table where createtime>=to_date('20190526 20:05:29','YYYYMMDD HH24:MI:SS') AND createtime<=to_date('20190527 01:34:55','YYYYMMDD HH24:MI:SS'); --3.跳过这段数据同步 GGSCI (xxxx) 2> ALTER EXTRACT EX_XXXX3, THREAD 2, BEGIN 2019-05-27 01:34:56 EXTRACT altered. GGSCI (xxxx) 2> ALTER EXTRACT EX_XXXX3, THREAD 1, BEGIN 2019-05-27 01:24:18 EXTRACT altered. GGSCI (xxxx) 14> ALTER EXTRACT EX_XXXX3, ETROLLOVER 2019-05-28 17:21:15 INFO OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file; it will not happen automatically. EXTRACT altered. GGSCI (xxxx) 4> start EX_XXXX3 Sending START request to MANAGER ... EXTRACT EX_XXXX3 starting GGSCI (xxxx) 4> info all --检查状态 GGSCI (xxxx) 4> info exttrail ./dirdat/lc Extract Trail: ./dirdat/lc Extract: EX_XXXX3 Seqno: 2831 --tracefile SEQNO已经跳转到2831 RBA: 78985035 File Size: 100M GGSCI (xxxx) 7> info DP_XXXX3 EXTRACT DP_XXXX3 Last Started 2019-05-27 14:47 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:04 ago) Log Read Checkpoint File ./dirdat/lc002830 --DB进程的trace文件不能自动跳转 2019-05-28 17:12:21.854829 RBA 1084
GGSCI (xxxx) 7> ALTER EXTRACT DP_XXXX3, EXTSEQNO 2831, EXTRBA 0
GGSCI (xxxx) 7> START DP_XXXX3
目标端
GGSCI (xxxx) 1> start RP_XXXX3
GGSCI (xxxx) 1> stats RP_XXXX3 --检查是否有数据,完成恢复
进程4 重新初始化恢复
源端:
1.根据SCN 导出数据
sqlplus ggadmin/ggadmin
select to_char(current_scn) from v$database; --15555721660
expdp userid=scott/tiger directory=expdir tables=testtable1 dumpfile=testtable1.dmp logfile=testtable1.log cluster=n CONTENT=data_only FLASHBACK_SCN=15555721660
scp 到目标端
2.目标端导入
impdp target/target directory=expdir remap_schema=scott:target dumpfile=testtable1.dmp table_exists_action=truncate
3. 修复OGG
随便找个不断档的时间点恢复即可
STOP EX_XXXX4 ALTER EXTRACT EX_XXXX4, THREAD 2, BEGIN 2019-05-28 16:00:00 ALTER EXTRACT EX_XXXX4, THREAD 1, BEGIN 2019-05-28 16:00:00 ALTER EXTRACT EX_XXXX4, ETROLLOVER 2019-05-28 17:21:15 INFO OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file; it will not happen automatically. EXTRACT altered. start EX_XXXX4 info all GGSCI (xxxx) 4> info exttrail ./dirdat/ld Extract Trail: ./dirdat/lc Extract: EX_XXXX3 Seqno: 2638 RBA: 45765133 File Size: 100M STOP DP_XXXX4 ALTER EXTRACT DP_XXXX4, EXTSEQNO 2638, EXTRBA 0 START DP_XXXX4 目标端: START REPLICAT RP_XXXX4,ATCSN 15555721660
完成恢复。
以上过程是后期梳理,并没有保留详细日志,所以存在一些出入,但大概操作步骤如此。