Oracle 11g Data Guard Error 16143 Heartbeat failed to connect to standby 处理方法


 

 

一.背景说明

 

前段时间一朋友在生产库上误操作,本来他是打算重启一下DG环境,结果在备库命令执行错误。

 

本应该执行

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; 

 

结果朋友执行成了如下命令:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH; 

 

中断了DG主备库的通信环境,这个finish是用来做Failover时用的。 当时让朋友在主库重新生成了一份standby controlfiles,然后copy到备库,在按正常模式启动就可以了。

   

    因为数据库识别主备库就是通过控制文件来的,所以理论上,只需要重新生成一份standby 控制文件就可以了。  后来朋友测试了一下,正常的拉起来了。

 

今天看到了当时的记录,就顺便模拟一下整个操作,顺便练练手。

 

 

 

二. 演示过程

 

2.1 DG 环境说明

OS: Oracle Linux6.3

DB: 11.2.0.3

 

SQL> select * from v$version;

 

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise EditionRelease 11.2.0.3.0 - 64bit Production

PL/SQL Release 11.2.0.3.0 - Production

CORE   11.2.0.3.0      Production

TNS for Linux: Version 11.2.0.3.0 -Production

NLSRTL Version 11.2.0.3.0 - Production

 

 

主库:

 

SQL> select open_mode from v$database;

 

OPEN_MODE

--------------------

READ WRITE

 

SQL>

SQL> set pagesize 200

SQL> select sequence#,applied fromv$archived_log order by sequence# desc;

 

 SEQUENCE# APPLIED

---------- ---------

       14 YES

       14 NO

       13 YES

       13 NO

       12 NO

       12 YES

       11 YES

       11 NO

       10 NO

       10 YES

        9 YES

        9 NO

        8 NO

        8 YES

        7 YES

        7 NO

        6 YES

        6 NO

        5 NO

        4 NO

 

20 rows selected.

 

 

备库:

SQL> select open_mode from v$database;

 

OPEN_MODE

--------------------

MOUNTED

 

SQL>

 

 

SQL> select sequence#,applied fromv$archived_log order by sequence# desc;

 

 SEQUENCE# APPLIED

---------- ---------

       14 YES

       13 YES

       12 YES

       11 YES

       10 YES

        9 YES

         8 YES

        7 YES

        6 YES

 

9 rows selected.

 

 

2.2 模拟故障

 

在备库执行如下命令:

SQL> ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH;

Database altered.

 

 

2.3 查看主库 alert log

 

[oracle@dg1 trace]$ pwd

/u01/app/oracle/diag/rdbms/dave_pd/dave/trace

 

[oracle@dg1 trace]$ tail -30 alert_dave.log

Thread 1 advanced to log sequence 14 (LGWRswitch)

 Current log# 2 seq# 14 mem# 0: /u01/app/oracle/oradata/dave/redo02.log

Fri Mar 29 03:30:12 2013

Archived Log entry 17 added for thread 1sequence 13 ID 0x3312f7c4 dest 1:

Fri Mar 29 03:30:13 2013

LNS: Standby redo logfile selected forthread 1 sequence 14 for destination LOG_ARCHIVE_DEST_2

Fri Mar 29 03:43:10 2013

Time drift detected. Please check VKTMtrace file for more details.

Fri Mar 29 04:45:31 2013

Time drift detected. Please check VKTMtrace file for more details.

Fri Mar 29 06:28:35 2013

Time drift detected. Please check VKTMtrace file for more details.

Fri Mar 29 07:08:14 2013

Thread 1 advanced to log sequence 15 (LGWRswitch)

 Current log# 3 seq# 15 mem# 0: /u01/app/oracle/oradata/dave/redo03.log

Fri Mar 29 07:08:16 2013

Archived Log entry 20 added for thread 1sequence 14 ID 0x3312f7c4 dest 1:

Fri Mar 29 07:08:17 2013

LNS: Standby redo logfile selected forthread 1 sequence 15 for destination LOG_ARCHIVE_DEST_2

Fri Mar 29 07:34:48 2013

Time drift detected. Please check VKTMtrace file for more details.

Fri Mar 29 07:48:55 2013

LNS: Attempting destinationLOG_ARCHIVE_DEST_2 network reconnect (3135)

LNS: Destination LOG_ARCHIVE_DEST_2 networkreconnect abandoned

Error 3135 for archive log file 3 to'dave_st'

Errors in file/u01/app/oracle/diag/rdbms/dave_pd/dave/trace/dave_nsa2_3181.trc:

ORA-03135: connection lost contact

LNS: Failed to archive log 3 thread 1sequence 15 (3135)

Fri Mar 29 07:51:45 2013

PING[ARC1]: Heartbeatfailed to connect to standby 'dave_st'. Error is 16143.

 

因为我们在备库执行的Finish命令,导致心跳中断了。

 

2.4 查看备库alert log

 

[oracle@dg2 trace]$ pwd

/u01/app/oracle/diag/rdbms/dave_st/dave/trace

 

[oracle@dg2 trace]$  tail -20 alert_dave.log

Terminal Recovery: thread 1 seq# 15 redorequired

Terminal Recovery:

Recovery of Online Redo Log: Thread 1 Group5 Seq 15 Reading mem 0

 Mem# 0: /u01/app/oracle/oradata/dave/stdbyredo02.log

Identified End-Of-Redo (failover) forthread 1 sequence 15 at SCN 0xffff.ffffffff

Incomplete Recovery applied until change1082890 time 03/29/2013 07:48:53

MRP0: Media Recovery Complete (dave)

Terminal Recovery: successful completion

Fri Mar 29 07:48:49 2013

ARCH: Archival stopped, error occurred.Will continue retrying

ORACLE Instance dave - Archival Error

Forcing ARSCN to IRSCN for TR 0:1082890

Attempt to set limbo arscn 0:1082890 irscn0:1082890

Resetting standby activation ID 856881092(0x3312f7c4)

ORA-16014: log 5 sequence# 15 not archived,no available destinations

ORA-00312: online log 5 thread 1:'/u01/app/oracle/oradata/dave/stdbyredo02.log'

MRP0: Background Media Recovery processshutdown (dave)

Fri Mar 29 07:48:50 2013

Terminal Recovery: completion detected(dave)

Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH

[oracle@dg2 trace]$

 

2.5 在主库切换归档

 

SQL> alter system switch logfile;

System altered.

 

SQL> alter system switch logfile;

System altered.

 

SQL> select sequence#,applied fromv$archived_log order by sequence# desc;

 

 SEQUENCE# APPLIED

---------- ---------

       16 NO

       15 NO

       14 NO

       14 YES

       13 YES

       13 NO

       12 YES

       12 NO

       11 NO

       11 YES

       10 NO

       10 YES

        9 YES

        9 NO

        8 YES

        8 NO

         7 YES

        7 NO

        6 NO

        6 YES

        5 NO

        4 NO

 

22 rows selected.

 

SQL>

 

2.6 再次查看主备库日志

 

主库日志:

 

Fri Mar 29 07:52:46 2013

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.

Fri Mar 29 07:53:47 2013

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.

Fri Mar 29 07:53:49 2013

Thread 1 advanced to log sequence 16 (LGWRswitch)

 Current log# 1 seq# 16 mem# 0: /u01/app/oracle/oradata/dave/redo01.log

Fri Mar 29 07:53:49 2013

Archived Log entry 21 added for thread 1sequence 15 ID 0x3312f7c4 dest 1:

Fri Mar 29 07:53:50 2013

FAL[server, ARC2]: Error 16143 creatingremote archivelog file 'dave_st'

FAL[server, ARC2]: FAL archive failed, seetrace file.

ARCH: FAL archive failed. Archivercontinuing

ORACLE Instance dave - Archival Error.Archiver continuing.

Thread 1 advanced to log sequence 17 (LGWRswitch)

 Current log# 2 seq# 17 mem# 0: /u01/app/oracle/oradata/dave/redo02.log

Fri Mar 29 07:53:57 2013

Archived Log entry 22 added for thread 1sequence 16 ID 0x3312f7c4 dest 1:

 

 

备库日志:

 

Fri Mar 29 07:48:50 2013

Terminal Recovery: completion detected(dave)

Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH

Fri Mar 29 07:51:34 2013

RFS[5]: Assigned to RFS process 9336

RFS[5]: No connections allowed during/afterterminal recovery.

Fri Mar 29 07:52:35 2013

RFS[6]: Assigned to RFS process 9340

RFS[6]: No connections allowed during/afterterminal recovery.

Fri Mar 29 07:53:36 2013

RFS[7]: Assigned to RFS process 9343

RFS[7]: No connections allowed during/afterterminal recovery.

Fri Mar 29 07:53:39 2013

RFS[8]: Assigned to RFS process 9345

RFS[8]: No connectionsallowed during/after terminal recovery.

 

 

2.7 在主库重建standby control file

 

先在备库查看一下控制文件名称,等会创建完后直接覆盖过去:

SQL> show parameter control

 

NAME                                 TYPE        VALUE

----------------------------------------------- ------------------------------

control_file_record_keep_time        integer     7

control_files                        string      /u01/app/oracle/oradata/dave/c

                                                ontrol01.ctl, /u01/app/oracle/

                                                fast_recovery_area/dave/contro

                                                 l02.ctl

control_management_pack_access       string      DIAGNOSTIC+TUNING

 

 

主库创建standby controlfile

SQL> alter database create standbycontrolfile as '/u01/control01.ctl';

Database altered.

 

 

copy到备库的目录,在覆盖原来的控制文件:

 

--先关闭备库:

SQL> shutdown immediate

ORA-01109: database not open

 

Database dismounted.

ORACLE instance shut down.

SQL>

 

 

--copy并覆盖:

[oracle@dg2 trace]$ cd/u01/app/oracle/oradata/dave/

[oracle@dg2 dave]$ ls

control01.ctl    stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf

stdbyredo01.log  stdbyredo03.log  sysaux01.dbf     temp01.dbf    users01.dbf

[oracle@dg2 dave]$ mv control01.ctlcontrol01.ctl.bak

[oracle@dg2 dave]$ ls

control01.ctl.bak  stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf

stdbyredo01.log    stdbyredo03.log  sysaux01.dbf     temp01.dbf    users01.dbf

 

[oracle@dg2 dave]$ scp192.168.1.20:/u01/control01.ctl 192.168.1.30:/u01/app/oracle/oradata/dave/

The authenticity of host '192.168.1.20(192.168.1.20)' can't be established.

RSA key fingerprint is0d:6a:5f:78:53:a0:bf:54:a8:e3:7e:67:81:06:8d:75.

Are you sure you want to continueconnecting (yes/no)? yes

Warning: Permanently added '192.168.1.20'(RSA) to the list of known hosts.

oracle@192.168.1.20's password:

oracle@192.168.1.30's password:

control01.ctl                                                                        100% 9520KB 865.5KB/s   00:11   

Connection to 192.168.1.20 closed.

[oracle@dg2 dave]$ ls

control01.ctl      stdbyredo01.log  stdbyredo03.log  sysaux01.dbf temp01.dbf     users01.dbf

control01.ctl.bak  stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf

[oracle@dg2 dave]$

 

 

[oracle@dg2 archivelog]$ cd/u01/app/oracle/fast_recovery_area/dave/

[oracle@dg2 dave]$ ls

control02.ctl

[oracle@dg2 dave]$ mv control02.ctlcontrol02.ctl.bak

[oracle@dg2 dave]$ ls

control02.ctl.bak

[oracle@dg2 dave]$

 

[oracle@dg2 dave]$ cp control01.ctl/u01/app/oracle/fast_recovery_area/dave/control02.ctl

 

2.8 在正常拉起备库

 

SQL> startup nomount;

ORACLE instance started.

 

Total System Global Area  814227456 bytes

Fixed Size                  2232760 bytes

Variable Size             478154312 bytes

Database Buffers          331350016 bytes

Redo Buffers                2490368 bytes

SQL> alter database mount standby database;

 

Database altered.

 

SQL> alter database recover managedstandby database disconnect from session;

 

Database altered.

 

SQL>

 

2.9 查看主备库日志

 

主库日志:

 

Fri Mar 29 08:00:51 2013

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.

Fri Mar 29 08:01:52 2013

Error 1034 received logging on to thestandby

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

Fri Mar 29 08:02:56 2013

Error 1034 received logging on to thestandby

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

Fri Mar 29 08:03:57 2013

Error 1034 received logging on to thestandby

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

Fri Mar 29 08:04:59 2013

Error 1034 received logging on to the standby

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

Fri Mar 29 08:06:02 2013

Error 1034 received logging on to thestandby

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

Fri Mar 29 08:07:05 2013

Error 1034 received logging on to thestandby

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.

Fri Mar 29 08:08:08 2013

PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16058.

Fri Mar 29 08:08:34 2013

ALTER SYSTEM SETlog_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';

Fri Mar 29 08:08:35 2013

Thread 1 advanced to log sequence 18 (LGWRswitch)

 Current log# 3 seq# 18 mem# 0: /u01/app/oracle/oradata/dave/redo03.log

Fri Mar 29 08:08:36 2013

******************************************************************

LGWR: Setting 'active'archival for destination LOG_ARCHIVE_DEST_2

******************************************************************

Fri Mar 29 08:08:36 2013

Archived Log entry 23added for thread 1 sequence 17 ID 0x3312f7c4 dest 1:

 

 

备库日志:

 

[oracle@dg2 trace]$ tail -20 alert_dave.log

ORA-27037: unable to obtain file status

Linux-x86_64 Error: 2: No such file ordirectory

Additional information: 3

Clearing online redo logfile 3 complete

Media Recovery Waiting for thread 1sequence 15

Fetching gap sequence in thread 1, gapsequence 15-16

Fri Mar 29 08:08:48 2013

RFS[3]: Assigned to RFS process 9707

RFS[3]: Opened log for thread 1 sequence 16dbid 856896964 branch 794014730

Fri Mar 29 08:08:49 2013

RFS[4]: Assigned to RFS process 9705

RFS[4]: Opened log for thread 1 sequence 15dbid 856896964 branch 794014730

Archived Log entry 2 added for thread 1sequence 16 rlc 794014730 ID 0x3312f7c4 dest 2:

Archived Log entry 3 added for thread 1sequence 15 rlc 794014730 ID 0x3312f7c4 dest 2:

Fri Mar 29 08:08:55 2013

Media Recovery Log/u01/archivelog/1_15_794014730.dbf

Media Recovery Log/u01/archivelog/1_16_794014730.dbf

Media Recovery Log/u01/archivelog/1_17_794014730.dbf

Fri Mar 29 08:09:11 2013

Media Recovery Waitingfor thread 1 sequence 18 (in transit)

 

注意这里:

    我们把备库拉起来之后,就自动开始同步了。

 

 

2.10 切换归档测试

 

主库:

SQL> alter system switch logfile;

 

System altered.

 

SQL> select sequence#,applied from v$archived_log order by sequence# desc;

 

 SEQUENCE# APPLIED

---------- ---------

       18 NO

       18 NO

       17 NO

       17 YES

       16 YES

       16 NO

       15 NO

       15 YES

       14 NO

       14 YES

       13 YES

       13 NO

       12 NO

       12 YES

       11 NO

       11 YES

       10 YES

       10 NO

        9 NO

        9 YES

        8 NO

        8 YES

        7 NO

        7 YES

        6 NO

        6 YES

        5 NO

        4 NO

 

28 rows selected.

 

 

备库:

 

SQL> select sequence#,applied from v$archived_log order by sequence# desc;

 SEQUENCE# APPLIED

---------- ---------

       18 YES

       17 YES

       16 YES

       15 YES

 

注意这里,备库已经完全同步了。 之前在我们主库看,18的日志还没有应用,因为我们刚启动备库,应用需要一定的时间。 并且在我们重新配置之后,这里的数字就从15开始了。 是我们中断DG后的数字。 但我们主库还是从4开始计算的。

   

 

小结:

    对于DG通信异常中断的处理,我们仅仅需要重新创建一份standby 的control file就可以了。

 

 

 

 

 

 

---------------------------------------------------------------------------------------

版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!

Skype:    tianlesoftware

QQ:       tianlesoftware@gmail.com

Email:    tianlesoftware@gmail.com

Blog:     http://blog.csdn.net/tianlesoftware

Weibo:    http://weibo.com/tianlesoftware

Twitter:  http://twitter.com/tianlesoftware

Facebook: http://www.facebook.com/tianlesoftware

Linkedin: http://cn.linkedin.com/in/tianlesoftware

posted @ 2013-03-30 00:52  davedba  阅读(577)  评论(0编辑  收藏  举报