首页  :: 新随笔  :: 管理

DB2高可用之HADR+TSAMP

Posted on 2022-09-14 09:20  高&玉  阅读(516)  评论(0编辑  收藏  举报

前言

HADR介绍

  高可用性灾难恢复 (HADR) 提供针对部分站点故障和整个站点故障的高可用性解决方案。HADR 通过将数据更改从源数据库(称为主数据库)复制到目标数据库(称为备用数据库)来防止数据丢失。HADR 最多支持 3 台远程备用服务器。

  部分站点故障可能是由硬件、网络或软件(DB2® 数据库系统或操作系统)故障引起的。如果没有 HADR,发生部分站点故障时就需要重新启动数据库所在的数据库管理系统(DBMS)服务器。重新启动数据库和数据库所在的服务器所需的时间长度是不可预测的。可能在几分钟时间后,数据库才会恢复为一致状态并可用。使用 HADR 时,备用数据库可在数秒内接管。另外,还可以通过使用客户机自动重新路由功能,或重试应用程序中的逻辑,将使用原始主数据库的客户机重定向至新的主数据库。

  当由于灾难(例如,火灾)而导致整个站点被破坏时,就可能会发生整个站点故障。但是,因为 HADR 使用 TCP/IP 在主数据库和备用数据库之间进行通信,所以数据库可以位于不同位置。例如,主数据库可能位于某个城市的总部,而备用数据库位于另一城市的销售办事处。如果在主要站点发生了灾难,那么可以通过让远程备用数据库接管具有所有 DB2 功能的主数据库来维护数据可用性。执行接管操作之后,可以备份原始主数据库,并将其返回至主数据库状态;这即是所谓的故障回退。如果您可以使旧的主数据库与新的主数据库保持一致,那么可以启动故障回退。旧的主数据库作为备用数据库重新集成到 HADR 设置之后,可以切换数据库角色,以再次将原始主数据库启用为主数据库。

 

SAMP介绍

  IBM® Tivoli® System Automation for Multiplatforms (SA MP) 针对 AIX®、Linux、Solaris SPARC 和 Windows 提供高可用性和灾难恢复功能。

  SA MP 与 AIX、Linux 和 Solaris SPARC 操作系统上的 DB2® Enterprise Server Edition、DB2 Advanced Enterprise Server Edition、DB2 Workgroup Server Edition、DB2 Connect™ Enterprise Edition 和 DB2 Connect Application Server Edition集成在一起。它还与 Express® 版集成到一起以与 DB2 Express Server Edition 定期许可证 (FTL) 和 DB2 High Availability Feature 配合使用。

 

DB2 HADR和TSAMP自动故障切换如何工作?

  当HADR中的主机发生故障,"hadrV10.5_monitor.ksh" 脚本侦测到故障,然后报告给TSAMP。TSAMP在故障主机调用"hadrV10.5_stop.ksh"脚本。当调用结束后,在HADR备机上调用"hadrV10.5_start.ksh"脚本。"hadrV10.5_start.ksh"脚本会调用 db2gcf程序; db2gcf程序会尝试故障切换让备机接管主机工作。接管的具体命令为"TAKEOVER HADR BY FORCE PEER WINDOW ONLY"。当旧的主机从故障中恢复,TSAMP会自动通过脚本"db2V97_start.ksh"启动DB2实例。作为这个操作的一部分,HADR数据库会通过命令"start hadr on db <dbname> as standby"恢复到STANDBY状态。

环境介绍

RedHat Linux 6.5

DB2 V10.5.10

主机名 IP VIP 角色
tsa1 192.168.1.61 192.168.1.66 PRIMARY
tsa2 192.168.1.62 STANDBY

 

架构图:

基础环境配置

1. 配置/etc/hosts

[root]# vi /etc/hosts
192.168.1.61 tsa1
192.168.1.62 tsa2

2. 配置RSH互信

[root]# yum install rsh* -y

[root]# vi /etc/securetty    #如果不配置此项,root用户执行rsh会报错。
#添加:
rsh
rlogin
rexec

[root]# vi /etc/hosts.equiv
+tsa1 db2inst1
+tsa2 db2inst1
+tsa1 root
+tsa2 root

[root]# vi /root/.rhosts
tsa1 root
tsa2 root

[db2inst1]# vi /home/db2inst1/.rhosts
tsa1 db2inst1
tsa2 db2inst1

[root]# chkconfig xinetd on

[root]# service xinetd restart

配置DB2 HADR

安装DB2软件

 1. 安装DB2前环境检查

[root]# yum install -y libaio compat-libstdc++ vacpp.rte libxlc ksh93
[root]# tar -zxvf v10.5fp10_linuxx64_server_t.tar.gz
[root]# cd server_t
[root]# ./db2prereqcheck    #环境检查,缺少什么就安装什么

2. 创建用户和组

[root]# groupadd db2igrp
[root]# useradd -g db2igrp db2inst1
[root]# echo "Your_passwd" | passwd --stdin db2inst1

3. 安装DB2

[root]# ./db2_install

4. 创建实例

[root]# /opt/ibm/db2/V10.5/instance/db2icrt -u db2inst1 db2inst1

5. 创建库+配置归档

[db2inst1]# db2start
[db2inst1]# db2 create database db01
[db2inst1]# db2 connect to db01
[db2inst1]# mkdir /home/db2inst1/arch_log
[db2inst1]# db2 update db cfg for db01 using LOGARCHMETH1 "disk:/home/db2inst1/arch_log"
[db2inst1]# db2 backup database db01

6. 配置SVCENAME

[db2inst1]# db2 update dbm cfg using SVCENAME 50001
[db2inst1]# db2set DB2COMM=tcpip
[db2inst1]# db2stop
[db2inst1]# db2start

配置HADR

1. 配置基础参数

# 备机可读
[db2inst1]# db2set DB2_HADR_ROS=on
[db2inst1]# db2set DB2_STANDBY_ISO=ur
#LOGINDEXBUILD参数指定是否记录索引创建、重新创建或重组操作,以便在DB2®前滚操作或高可用性灾难恢复(HADR)日志重放过程中重构索引。
[db2inst1]# db2 update db cfg for dbclass using LOGINDEXBUILD ON

2. tsa1在线备份的db01并传输到tsa2上

[db2inst1]# db2 backup database db01 online
[db2inst1]# scp DB01.* db2inst1@tsa2:/home/db2inst1/

3. tsa2节点恢复DB,不进行前滚

[db2inst1]# db2 restore database db01 from /home/db2inst1/

4. tsa2节点配置HADR

[db2inst1]# db2 update db cfg for db01 using hadr_local_host 192.168.1.62
[db2inst1]# db2 update db cfg for db01 using hadr_local_svc 50012
[db2inst1]# db2 update db cfg for db01 using hadr_remote_host 192.168.1.61
[db2inst1]# db2 update db cfg for db01 using hadr_remote_svc 50011
[db2inst1]# db2 update db cfg for db01 using hadr_remote_inst db2inst1
[db2inst1]# db2 update db cfg for db01 using hadr_syncmode NEARSYNC
[db2inst1]# db2 update db cfg for db01 using HADR_TIMEOUT 120
[db2inst1]# db2 update db cfg for db01 using HADR_PEER_WINDOW 120

5. tsa1节点配置HADR

[db2inst1]# db2 update db cfg for db01 using hadr_local_host 192.168.1.61
[db2inst1]# db2 update db cfg for db01 using hadr_local_svc 50011
[db2inst1]# db2 update db cfg for db01 using hadr_remote_host 192.168.1.62
[db2inst1]# db2 update db cfg for db01 using hadr_remote_svc 50012
[db2inst1]# db2 update db cfg for db01 using hadr_remote_inst db2inst1
[db2inst1]# db2 update db cfg for db01 using hadr_syncmode NEARSYNC
[db2inst1]# db2 update db cfg for db01 using HADR_TIMEOUT 120
[db2inst1]# db2 update db cfg for db01 using HADR_PEER_WINDOW 120

6. tsa2节点启动HADR

[db2inst1]# db2 deactivate db db01
[db2inst1]# db2 start hadr on db db01 as standby

7. tsa1节点启动HADR

[db2inst1]# db2 deactivate db db01
[db2inst1]# db2 start hadr on db db01 as primary

8. 查看HADR状态

[db2inst1@tsa1]# db2pd -db db01 -hadr
Database Member 0 -- Database DB01 -- Active -- Up 0 days 00:02:20 -- Date 2022-09-13-18.23.33.513608

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER    #HADR状态
                           HADR_FLAGS = STANDBY_LOG_RETRIEVAL
                  PRIMARY_MEMBER_HOST = 192.168.1.61
                     PRIMARY_INSTANCE = db2inst1
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = 192.168.1.62
                     STANDBY_INSTANCE = db2inst1
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED    #HADR连接状态
             HADR_CONNECT_STATUS_TIME = 09/13/2022 18:23:16.251815 (1663107796)
          HEARTBEAT_INTERVAL(seconds) = 30
                     HEARTBEAT_MISSED = 0
                   HEARTBEAT_EXPECTED = 0
                HADR_TIMEOUT(seconds) = 120
        TIME_SINCE_LAST_RECV(seconds) = 0
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
                  LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 19800
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
            PRIMARY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976769
            STANDBY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976769
                  HADR_LOG_GAP(bytes) = 0
     STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976769
       STANDBY_RECV_REPLAY_GAP(bytes) = 0
                     PRIMARY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
                     STANDBY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
              STANDBY_REPLAY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
         STANDBY_RECV_BUF_SIZE(pages) = 4298
             STANDBY_RECV_BUF_PERCENT = 0
           STANDBY_SPOOL_LIMIT(pages) = 25600
                STANDBY_SPOOL_PERCENT = 0
                   STANDBY_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 120
                      PEER_WINDOW_END = 09/13/2022 18:25:15.000000 (1663107915)
             READS_ON_STANDBY_ENABLED = Y
    STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N

配置TSAMP

1. 配置/etc/modprobe.d/blacklist.conf

[root]# vi /etc/modprobe.d/blacklist.conf
#添加如下两行:
blacklist iTCO_wdt
blacklist iTCO_vendor_support

2. 安装TSAMP依赖包

[root]# yum install -y compat-libstdc++-33 pam pam.i686

3. 修改环境变量

[root]# echo "export CT_MANAGEMENT_SCOPE=2" >> /etc/profile
[root]# source /etc/profile

4. 安装IBM Tivoli® System Automation for Multiplatforms (SA MP)

[root]# cd /root/server_t/db2/linuxamd64/tsamp
[root]# ./installSAM
prereqSAM: All prerequisites for the ITSAMP installation are met on operating system: 
Red Hat Enterprise Linux Server release 6.5 (Santiago)
SAM is currently not installed.
installSAM: The following package is not installed yet and needs to be installed:  ./Linux/i386/sam-4.1.0.3-16104.i386.rpm

installSAM: A general License Agreement and License Information specifically for System Automation will be shown. Scroll down using the Enter key (line by line) or Space bar (page by page). At the end you will be asked to accept the terms to be allowed to install the product. Select Enter to continue.
# 按"Enter"继续
# 省略大部分内容...
installSAM: To accept all terms of the preceding License Agreement and License Information type 'y', anything else to decline.
# 按"y"继续
# 省略大部分内容...
installSAM: The following license is installed: 
Product: IBM Tivoli System Automation for Multiplatforms (Try & Buy License) 4.1.0.0
Evaluation Period: 90 days (90 days left)
Creation date: Fri 16 Aug 2013 12:00:01 AM EDT
Expiration date: Thu 31 Dec 2037 12:00:01 AM EST


Subsystem         Group            PID     Status 
 ctrmc            rsct             27669   active

installSAM: All packages were installed successfully.

5. 以“root”用户执行preprnode预处理相关节点,直接执行db2haicu则会报错

[root]# /usr/sbin/rsct/bin/preprpnode tsa1 tsa2

6. 准备DB2 HA scripts

[root]# /opt/ibm/db2/V10.5/install/tsamp/db2cptsa
DBI1119I  The version of the DB2 High Availability (HA) scripts for the
      IBM Tivoli System Automation for Multiplatforms (SA MP) found in
      /usr/sbin/rsct/sapolicies/db2 is the same version as the version
      of the scripts on the current DB2 install media.

Explanation: 

You need DB2 HA scripts to use SA MP with the DB2 HA feature.

SA MP and the DB2 HA feature being installed from the DB2 install media
require DB2 HA scripts with a version the same as or higher than the
version of the scripts also on the DB2 install media.

The version of the DB2 HA scripts currently installed the same as the
version of the scripts on the DB2 install media.

User response: 

No action is required.


DBI1110I  The DB2 High Availability (HA) scripts for the IBM Tivoli
      System Automation for Multiplatforms (SA MP) were successfully
      updated in /usr/sbin/rsct/sapolicies/db2.

Explanation: 

You need DB2 HA scripts to use SA MP with the DB2 HA feature.

These DB2 HA scripts are located at /usr/sbin/rsct/sapolicies/db2. The
DB2 installer detects whether these DB2 HA scripts need to be installed
or updated.

The DB2 installer successfully updated the DB2 HA scripts.

User response: 

No action is required.

7. HADR备库(tsa2节点)配置db2haicu

[db2inst1]# db2haicu
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file called db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is 'db2inst1'. The cluster configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some time as db2haicu will need to activate all databases for the instance to discover all paths ...
When you use db2haicu to configure your clustered environment, you create cluster domains. For more information, see the topic 'Creating a cluster domain with db2haicu' in the DB2 Information Center. db2haicu is searching the current machine for an existing active cluster domain ...
db2haicu did not find a cluster domain on this machine. db2haicu will now query the system for information about cluster nodes to create a new cluster domain ...

db2haicu did not find a cluster domain on this machine. To continue configuring your clustered environment for high availability, you must create a cluster domain; otherwise, db2haicu will exit.

Create a domain and continue? [1]
1. Yes
2. No
1
Create a unique name for the new domain:
db01_tsa
Nodes must now be added to the new domain.
How many cluster nodes will the domain 'db01_tsa' contain?
2
Enter the host name of a machine to add to the domain:
tsa1
Enter the host name of a machine to add to the domain:
tsa2
db2haicu can now create a new domain containing the 2 machines that you specified. If you choose not to create a domain now, db2haicu will exit.

Create the domain now? [1]
1. Yes
2. No
1
Creating domain 'db01_tsa' in the cluster ...
Creating domain 'db01_tsa' in the cluster was successful.
You can now configure a quorum device for the domain. For more information, see the topic "Quorum devices" in the DB2 Information Center. If you do not configure a quorum device for the domain, then a human operator will have to manually intervene if subsets of machines in the cluster lose connectivity.

Configure a quorum device for the domain called 'db01_tsa'? [1]
1. Yes
2. No
1
The following is a list of supported quorum device types:
  1. Network Quorum
Enter the number corresponding to the quorum device type to be used: [1]
1
Specify the network address of the quorum device:
192.168.1.1
Configuring quorum device for domain 'db01_tsa' ...
Configuring quorum device for domain 'db01_tsa' was successful.
The cluster manager found the following total number of network interface cards on the machines in the cluster domain: '2'.  You can add a network to your cluster domain using the db2haicu utility.

Create networks for these network interface cards? [1]
1. Yes
2. No
1
Enter the name of the network for the network interface card: 'eth0' on cluster node: 'tsa1'
1. Create a new public network for this network interface card.
2. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card 'eth0' on cluster node 'tsa1' to the network 'db2_public_network_0'? [1]
1. Yes
2. No
1
Adding network interface card 'eth0' on cluster node 'tsa1' to the network 'db2_public_network_0' ...
Adding network interface card 'eth0' on cluster node 'tsa1' to the network 'db2_public_network_0' was successful.
Enter the name of the network for the network interface card: 'eth0' on cluster node: 'tsa2'
1. db2_public_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card 'eth0' on cluster node 'tsa2' to the network 'db2_public_network_0'? [1]
1. Yes
2. No
1
Adding network interface card 'eth0' on cluster node 'tsa2' to the network 'db2_public_network_0' ...
Adding network interface card 'eth0' on cluster node 'tsa2' to the network 'db2_public_network_0' was successful.
Retrieving high availability configuration parameter for instance 'db2inst1' ...
The cluster manager name configuration parameter (high availability configuration parameter) is not set. For more information, see the topic "cluster_mgr - Cluster manager name configuration parameter" in the DB2 Information Center. Do you want to set the high availability configuration parameter?
The following are valid settings for the high availability configuration parameter:
  1.TSA
  2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance 'db2inst1' to 'TSA'.
Adding DB2 database partition '0' to the cluster ...
Adding DB2 database partition '0' to the cluster was successful.
Do you want to validate and automate HADR failover for the HADR database 'DB01'? [1]
1. Yes
2. No
1
Adding HADR database 'DB01' to the domain ...
Cluster node '192.168.1.61' was not found in the domain. Please re-enter the host name.
tsa1
Cluster node '192.168.1.62' was not found in the domain. Please re-enter the host name.
tsa2
Adding HADR database 'DB01' to the domain ...
HADR database 'DB01' has been determined to be valid for high availability. However, the database cannot be added to the cluster from this node because db2haicu detected this node is the standby for HADR database 'DB01'. Run db2haicu on the primary for HADR database 'DB01' to configure the database for automated failover.
All cluster configurations have been completed successfully. db2haicu exiting ...

8. HADR主库(tsa1节点)配置db2haicu

[db2inst1]# db2haicu
Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file called db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is 'db2inst1'. The cluster configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some time as db2haicu will need to activate all databases for the instance to discover all paths ...
When you use db2haicu to configure your clustered environment, you create cluster domains. For more information, see the topic 'Creating a cluster domain with db2haicu' in the DB2 Information Center. db2haicu is searching the current machine for an existing active cluster domain ...
db2haicu found a cluster domain called 'db01_tsa' on this machine. The cluster configuration that follows will apply to this domain.

Retrieving high availability configuration parameter for instance 'db2inst1' ...
The cluster manager name configuration parameter (high availability configuration parameter) is not set. For more information, see the topic "cluster_mgr - Cluster manager name configuration parameter" in the DB2 Information Center. Do you want to set the high availability configuration parameter?
The following are valid settings for the high availability configuration parameter:
  1.TSA
  2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance 'db2inst1' to 'TSA'.
Adding DB2 database partition '0' to the cluster ...
Adding DB2 database partition '0' to the cluster was successful.
Do you want to validate and automate HADR failover for the HADR database 'DB01'? [1]
1. Yes
2. No
1
Adding HADR database 'DB01' to the domain ...
Cluster node '192.168.1.62' was not found in the domain. Please re-enter the host name.
tsa2
Cluster node '192.168.1.61' was not found in the domain. Please re-enter the host name.
tsa1
Adding HADR database 'DB01' to the domain ...
Adding HADR database 'DB01' to the domain was successful.
Do you want to configure a virtual IP address for the HADR database 'DB01'? [1]
1. Yes
2. No
1
Enter the virtual IP address:
192.168.1.66
Enter the subnet mask for the virtual IP address '192.168.1.66': [255.255.255.0]
255.255.255.0
Select the network for the virtual IP '192.168.1.66':
1. db2_public_network_0
Enter selection:
1
Adding virtual IP address '192.168.1.66' to the domain ...
Adding virtual IP address '192.168.1.66' to the domain was successful.
All cluster configurations have been completed successfully. db2haicu exiting ...

9. 查看VIP:192.168.1.66在tsa1节点上

[db2inst1@tsa1]# ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:35:42:94 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.61/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.66/24 brd 192.168.1.255 scope global secondary eth0:0
    inet6 2408:8207:3c1d:1770:20c:29ff:fe35:4294/64 scope global dynamic 
       valid_lft 173571sec preferred_lft 87171sec
    inet6 fe80::20c:29ff:fe35:4294/64 scope link 
       valid_lft forever preferred_lft forever

监控TSAMP状态

查看有哪些资源组

[root@tsa1]# lsrg
Resource Group names:
db2_db2inst1_db2inst1_DB01-rg  
db2_db2inst1_tsa1_0-rg         
db2_db2inst1_tsa2_0-rg 

 

查看资源组状态

[root@tsa1]# lssam
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_DB01-rg Nominal=Online
        |- Online IBM.Application:db2_db2inst1_db2inst1_DB01-rs
                |- Offline IBM.Application:db2_db2inst1_db2inst1_DB01-rs:tsa1
                '- Online IBM.Application:db2_db2inst1_db2inst1_DB01-rs:tsa2
        '- Online IBM.ServiceIP:db2ip_192_168_1_66-rs
                |- Offline IBM.ServiceIP:db2ip_192_168_1_66-rs:tsa1
                '- Online IBM.ServiceIP:db2ip_192_168_1_66-rs:tsa2
Online IBM.ResourceGroup:db2_db2inst1_tsa1_0-rg Nominal=Online
        '- Online IBM.Application:db2_db2inst1_tsa1_0-rs
                '- Online IBM.Application:db2_db2inst1_tsa1_0-rs:tsa1
Online IBM.ResourceGroup:db2_db2inst1_tsa2_0-rg Nominal=Online
        '- Online IBM.Application:db2_db2inst1_tsa2_0-rs
                '- Online IBM.Application:db2_db2inst1_tsa2_0-rs:tsa2
Online IBM.Equivalency:db2_db2inst1_db2inst1_DB01-rg_group-equ
        |- Online IBM.PeerNode:tsa1:tsa1
        '- Online IBM.PeerNode:tsa2:tsa2
Online IBM.Equivalency:db2_db2inst1_tsa1_0-rg_group-equ
        '- Online IBM.PeerNode:tsa1:tsa1
Online IBM.Equivalency:db2_db2inst1_tsa2_0-rg_group-equ
        '- Online IBM.PeerNode:tsa2:tsa2
Online IBM.Equivalency:db2_public_network_0
        |- Online IBM.NetworkInterface:eth0:tsa1
        '- Online IBM.NetworkInterface:eth0:tsa2

 

查看TSAMP集群域

[db2inst1@tsa1]# lsrpdomain
Name     OpState RSCTActiveVersion MixedVersions TSPort GSPort 
db01_tsa Online  3.1.5.12          No            12347  12348  

 

查看TSAMP集群节点

[db2inst1@tsa1]# lsrpnode
Name OpState RSCTVersion 
tsa2 Online  3.1.5.12    
tsa1 Online  3.1.5.12

 

通过db2pd查看所有信息

[db2inst1@tsa1]# db2pd -ha
           DB2 HA Status 
Instance Information:
Instance Name                  = db2inst1          
Number Of Domains              = 1         
Number Of RGs for instance     = 2         

Domain Information:
Domain Name                    = db01_tsa                
Cluster Version                = 3.1.5.12  
Cluster State                  = Online    
Number of nodes                = 2         

Node Information:
Node Name                     State                         
---------------------         -------------------           
tsa2                          Online                        
tsa1                          Online                        

Resource Group Information:
Resource Group Name            = db2_db2inst1_db2inst1_DB01-rg
Resource Group LockState       = Unlocked                
Resource Group OpState         = Online                  
Resource Group Nominal OpState = Online                  
Number of Group Resources      = 2         
Number of Allowed Nodes        = 2         
   Allowed Nodes                 
   -------------                 
   tsa1                          
   tsa2                          
Member Resource Information:
   Resource Name                  = db2_db2inst1_db2inst1_DB01-rs
   Resource State                 = Online    
   Resource Type                  = HADR      
   HADR Primary Instance          = db2inst1                      
   HADR Secondary Instance        = db2inst1                      
   HADR DB Name                   = DB01                          
   HADR Primary Node              = tsa1                          
   HADR Secondary Node            = tsa2                          

   Resource Name                  = db2ip_192_168_1_66-rs
   Resource State                 = Online    
   Resource Type                  = IP        

Resource Group Name            = db2_db2inst1_tsa1_0-rg
Resource Group LockState       = Unlocked                
Resource Group OpState         = Online                  
Resource Group Nominal OpState = Online                  
Number of Group Resources      = 1         
Number of Allowed Nodes        = 1         
   Allowed Nodes                 
   -------------                 
   tsa1                          
Member Resource Information:
   Resource Name                  = db2_db2inst1_tsa1_0-rs
   Resource State                 = Online    
   Resource Type                  = DB2 Member
   DB2 Member Number              = 0         
   Number of Allowed Nodes        = 1         
      Allowed Nodes                 
      -------------                 
      tsa1                          

Network Information:
Network Name                  Number of Adapters       
-----------------------       ------------------       
db2_public_network_0          2                        

   Node Name                     Adapter Name             
   -----------------------       ------------------       
   tsa1                          eth0                     
   tsa2                          eth0                     

Quorum Information:
Quorum Name                                  Quorum State                       
------------------------------------         --------------------               
db2_Quorum_Network_192_168_1_1:18_26_31      Online                             
Success                                      Offline                            
Fail                                         Offline                            
Operator                                     Offline   

 

通过SQL查看节点状态

[db2inst1]# db2 connect to db01
[db2inst1@tsa1]# db2 "select varchar(hostname,10) as HOSTNAME,varchar(INSTANCE_STOPPED,16) as INSTANCE_STOPPED,varchar(ALERT,5) as ALERT,varchar(STATE,8) as STATE from sysibmadm.DB2_CLUSTER_HOST_STATE"

HOSTNAME   INSTANCE_STOPPED ALERT STATE   
---------- ---------------- ----- --------
tsa2       NO               NO    ACTIVE  
tsa1       NO               NO    ACTIVE 

TSAMP切换

TSAMP手动切换

1. VIP(192.168.1.66)在tsa1节点上,tsa1节点为DB2 HADR的PRIMARY

[db2inst1@tsa1]# ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:35:42:94 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.61/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.66/24 brd 192.168.1.255 scope global secondary eth0:0
    inet6 2408:8207:3c1d:1770:20c:29ff:fe35:4294/64 scope global dynamic 
       valid_lft 214882sec preferred_lft 128482sec
    inet6 fe80::20c:29ff:fe35:4294/64 scope link 
       valid_lft forever preferred_lft forever
[db2inst1@tsa1]# db2pd -db db01 -hadr

Database Member 0 -- Database DB01 -- Active -- Up 0 days 00:02:20 -- Date 2022-09-13-18.23.33.513608

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS = STANDBY_LOG_RETRIEVAL
                  PRIMARY_MEMBER_HOST = 192.168.1.61
                     PRIMARY_INSTANCE = db2inst1
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = 192.168.1.62
                     STANDBY_INSTANCE = db2inst1
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
             HADR_CONNECT_STATUS_TIME = 09/13/2022 18:23:16.251815 (1663107796)
          HEARTBEAT_INTERVAL(seconds) = 30
                     HEARTBEAT_MISSED = 0
                   HEARTBEAT_EXPECTED = 0
                HADR_TIMEOUT(seconds) = 120
        TIME_SINCE_LAST_RECV(seconds) = 0
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
                  LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 19800
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
            PRIMARY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976769
            STANDBY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976769
                  HADR_LOG_GAP(bytes) = 0
     STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976769
       STANDBY_RECV_REPLAY_GAP(bytes) = 0
                     PRIMARY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
                     STANDBY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
              STANDBY_REPLAY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
         STANDBY_RECV_BUF_SIZE(pages) = 4298
             STANDBY_RECV_BUF_PERCENT = 0
           STANDBY_SPOOL_LIMIT(pages) = 25600
                STANDBY_SPOOL_PERCENT = 0
                   STANDBY_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 120
                      PEER_WINDOW_END = 09/13/2022 18:25:15.000000 (1663107915)
             READS_ON_STANDBY_ENABLED = Y
    STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N

2. tsa1节点上手动迁移资源组到tsa2节点上

[root@tsa1]# lsrg
Resource Group names:
db2_db2inst1_db2inst1_DB01-rg  
db2_db2inst1_tsa1_0-rg         
db2_db2inst1_tsa2_0-rg
[root@tsa1]# rgreq -o move db2_db2inst1_db2inst1_DB01-rg
Completed applying request to resource group "db2_db2inst1_db2inst1_DB01-rg".

3. tsa1节点上看不到VIP(192.168.1.66)

[db2inst1@tsa1]# ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:35:42:94 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.61/24 brd 192.168.1.255 scope global eth0
    inet6 2408:8207:3c1d:1770:20c:29ff:fe35:4294/64 scope global dynamic 
       valid_lft 214648sec preferred_lft 128248sec
    inet6 fe80::20c:29ff:fe35:4294/64 scope link 
       valid_lft forever preferred_lft forever

4. tsa2节点查看VIP(192.168.1.66)

[db2inst1@tsa2]# ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:ad:32:0a brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.62/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.66/24 brd 192.168.1.255 scope global secondary eth0:0
    inet6 2408:8207:3c1d:1770:20c:29ff:fead:320a/64 scope global dynamic 
       valid_lft 214691sec preferred_lft 128291sec
    inet6 fe80::20c:29ff:fead:320a/64 scope link 
       valid_lft forever preferred_lft forever

5. 查看HADR状态

[db2inst1@tsa2]# db2pd -db db01 -hadr

Database Member 0 -- Database DB01 -- Active -- Up 0 days 01:22:03 -- Date 2022-09-13-19.45.54.496192

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS = 
                  PRIMARY_MEMBER_HOST = 192.168.1.62
                     PRIMARY_INSTANCE = db2inst1
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = 192.168.1.61
                     STANDBY_INSTANCE = db2inst1
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
             HADR_CONNECT_STATUS_TIME = 09/13/2022 19:41:54.715119 (1663112514)
          HEARTBEAT_INTERVAL(seconds) = 30
                     HEARTBEAT_MISSED = 0
                   HEARTBEAT_EXPECTED = 162
                HADR_TIMEOUT(seconds) = 120
        TIME_SINCE_LAST_RECV(seconds) = 29
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
                  LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 19800
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
            PRIMARY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976814
            STANDBY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976814
                  HADR_LOG_GAP(bytes) = 0
     STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000006.LOG, 0, 69976814
       STANDBY_RECV_REPLAY_GAP(bytes) = 0
                     PRIMARY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
                     STANDBY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
              STANDBY_REPLAY_LOG_TIME = 09/13/2022 16:58:39.000000 (1663102719)
         STANDBY_RECV_BUF_SIZE(pages) = 4298
             STANDBY_RECV_BUF_PERCENT = 0
           STANDBY_SPOOL_LIMIT(pages) = 25600
                STANDBY_SPOOL_PERCENT = 0
                   STANDBY_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 120
                      PEER_WINDOW_END = 09/13/2022 19:47:26.000000 (1663112846)
             READS_ON_STANDBY_ENABLED = Y
    STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N

此时,tsa2节点为DB2 HADR的PRIMARY,tsa1节点为DB2 HADR的STANDBY。 

TSAMP 自动切换

 1. 模拟故障,tsa1节点主机断电

 

2. tsa2节点查看VIP(192.168.1.66),等待几十秒

[db2inst1@tsa2]# ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:ad:32:0a brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.62/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.66/24 brd 192.168.1.255 scope global secondary eth0:0
    inet6 2408:8207:3c1d:1770:20c:29ff:fead:320a/64 scope global dynamic 
       valid_lft 211984sec preferred_lft 125584sec
    inet6 fe80::20c:29ff:fead:320a/64 scope link 
       valid_lft forever preferred_lft forever

 

3. tsa1节点通电开机(tsa1节点自动以standby角色加入到HADR集群中)

 

4. tsa2节点查看HADR

[db2inst1@tsa2]# db2pd -db db01 -hadr

Database Member 0 -- Database DB01 -- Active -- Up 0 days 00:07:15 -- Date 2022-09-13-20.30.27.134690

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS = 
                  PRIMARY_MEMBER_HOST = 192.168.1.62
                     PRIMARY_INSTANCE = db2inst1
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = 192.168.1.61
                     STANDBY_INSTANCE = db2inst1
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
             HADR_CONNECT_STATUS_TIME = 09/13/2022 20:29:29.644403 (1663115369)
          HEARTBEAT_INTERVAL(seconds) = 30
                     HEARTBEAT_MISSED = 0
                   HEARTBEAT_EXPECTED = 9
                HADR_TIMEOUT(seconds) = 120
        TIME_SINCE_LAST_RECV(seconds) = 28
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
                  LOG_HADR_WAIT_COUNT = 0
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 19800
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
            PRIMARY_LOG_FILE,PAGE,POS = S0000009.LOG, 0, 82498286
            STANDBY_LOG_FILE,PAGE,POS = S0000009.LOG, 0, 82498286
                  HADR_LOG_GAP(bytes) = 0
     STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000009.LOG, 0, 82498286
       STANDBY_RECV_REPLAY_GAP(bytes) = 0
                     PRIMARY_LOG_TIME = 09/13/2022 20:26:03.000000 (1663115163)
                     STANDBY_LOG_TIME = 09/13/2022 20:26:03.000000 (1663115163)
              STANDBY_REPLAY_LOG_TIME = 09/13/2022 20:26:03.000000 (1663115163)
         STANDBY_RECV_BUF_SIZE(pages) = 4298
             STANDBY_RECV_BUF_PERCENT = 0
           STANDBY_SPOOL_LIMIT(pages) = 25600
                STANDBY_SPOOL_PERCENT = 0
                   STANDBY_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 120
                      PEER_WINDOW_END = 09/13/2022 20:32:00.000000 (1663115520)
             READS_ON_STANDBY_ENABLED = Y
    STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N

 

参考:

https://www.ibm.com/docs/zh/db2/10.5?topic=server-high-availability-disaster-recovery-hadr