达梦主从环境(配置手工切换)故障模拟
环境:
OS:Centos7
DB:DM8
主库:192.168.1.135
备库:192.168.1.134
主备库dmwatcher.ini配置文件如下:
[dmdba@host134 slnngk]$ more dmwatcher.ini
[GRP1]
DW_TYPE = GLOBAL ##全局守护类型
DW_MODE = MANUAL ##手工切换
DW_ERROR_TIME = 10 ##远程守护进程故障认定时间
INST_RECOVER_TIME = 60 ##主库守护进程启动恢复的间隔时间
INST_ERROR_TIME = 10 ##本地实例故障认定时间
INST_OGUID = 453332 ##守护系统唯一 OGUID 值
INST_INI = /dmdbms/data/slnngk/dm.ini #dm.ini配置文件路径
INST_AUTO_RESTART = 1 ##打开实例的自动启动功能
INST_STARTUP_CMD = /dmdbms/product/bin/dmserver #命令行方式启动
RLOG_SEND_THRESHOLD = 0 ##指定主库发送日志到备库的时间阀值,默认关闭
RLOG_APPLY_THRESHOLD = 0 ##指定备库重演日志的时间阀值,默认关闭
1.停掉备库
[root@host134 ~]#systemctl stop DmServiceslnngk.service
发现dmwatcher会把数据库拉起来
[root@host134 ~]# ps -ef|grep slnngk
dmdba 19750 1 0 Jul15 ? 00:19:34 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 23199 1 1 13:49 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 23538 32322 0 13:50 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:14 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini
2.停掉备库的dmwatcher进程
[root@host134 ~]#systemctl stop DmWatcherServiceGRP1
这个时候备库的守护进程dmwatcher进程和数据库进程都停掉了
[root@host134 ~]# ps -ef|grep slnngk
root 25001 32322 0 14:01 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:15 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini
启动守护进程dmwatcher
[root@host134 ~]#systemctl start DmWatcherServiceGRP1
这个时候守护进程dmwatcher会把备库拉起来
[root@host134 ~]# ps -ef|grep slnngk
dmdba 25477 1 0 14:04 ? 00:00:00 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 25507 1 1 14:04 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 25694 32322 0 14:05 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:15 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini
3.停掉主库
[root@host135 soft]# systemctl stop DmServiceslnngk.service
这个时候守护进程会把主库拉起来
[root@host135 soft]# ps -ef|grep slnngk
dmdba 694 1 0 Jul15 ? 00:20:14 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 710 1 1 14:23 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 981 11261 0 14:24 pts/5 00:00:00 grep --color=auto slnngk
数据库状态是打开的
[dmdba@host135 ~]$ disql sysdba/dameng123
Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 2.627(ms)
disql V8
SQL> select status$ from SYS."V$DATABASE";
LINEID STATUS$
---------- -----------
1 4
used time: 3.409(ms). Execute id is 800.
尝试kill掉进程
[root@host135 soft]# ps -ef|grep slnngk
dmdba 694 1 0 Jul15 ? 00:20:14 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 710 1 0 14:23 ? 00:00:01 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 2319 11261 0 14:31 pts/5 00:00:00 grep --color=auto slnngk
[root@host135 soft]#kill -9 710
这个时候因为主从我是配置为手工切换的,所以不会发生切换,守护进程会自动把主库拉起来,角色还是主库的角色
4.停掉主库的dmwatcher进程
[root@host135 soft]# systemctl stop DmWatcherServiceGRP1
这个时候数据库进程和数据库守护进程没有了
[root@host135 soft]# ps -ef|grep slnngk
root 3872 11261 0 14:37 pts/5 00:00:00 grep --color=auto slnngk
这个时候监控机无法监控到主库的信息了
show
2022-07-26 14:47:02
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GRP1 453332 TRUE MANUAL FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.134 52141 2022-07-26 14:47:01 GLOBAL VALID OPEN SLNNGKBAK OK 1 1 OPEN STANDBY DSC_OPEN REALTIME INVALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.134 5236 OK SLNNGKBAK OPEN STANDBY 0 0 REALTIME UNKNOWN 382176 437742 382176 437742 NONE
DATABASE(SLNNGKBAK) APPLY INFO FROM (UNKNOWN), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[383314, 383314, 383314], (RLSN, SLSN, KLSN)[437742, 437742, 437742], N_TSK[0], TSK_MEM_USE[0]
REDO_LSN_ARR: (437742)
手工启动守护进程
[root@host135 soft]# systemctl start DmWatcherServiceGRP1
这个时候主库恢复了,角色还是主库,没有发生切换,因为我配置的是手工切换.
5.手工把备库切换成主库
choose takeover GRP1
Can choose one of the following instances to do takeover:
1: SLNNGKBAK
takeover GRP1.SLNNGKBAK
这个时候查看数据库状态
show
2022-07-26 15:25:12
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GRP1 453332 TRUE MANUAL FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.134 52141 2022-07-26 15:25:11 GLOBAL VALID OPEN SLNNGKBAK OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.134 5236 OK SLNNGKBAK OPEN PRIMARY 0 0 REALTIME VALID 383964 440754 383964 440755 NONE
ERROR DATABASE:
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.135 52141 2022-07-26 15:19:11 GLOBAL VALID ERROR SLNNGK OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.135 5236 OK SLNNGK OPEN PRIMARY 0 0 REALTIME VALID 383951 439290 383951 439290 NONE
#================================================================================#
这个时候原来的主库192.168.1.135状态是ERROR的.
我们尝试在目前的主库写入数据,然后启动原来的主库,看数据是否同步
192.168.1.134
su - dmdba
[dmdba@host134 ~]$ disql hxl/dameng123
Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 3.029(ms)
disql V8
SQL> select * from tb_test01;
LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
used time: 4.038(ms). Execute id is 600.
SQL> insert into tb_test01 values(6,'name6');
affect rows 1
used time: 1.427(ms). Execute id is 601.
SQL> insert into tb_test01 values(7,'name7');
affect rows 1
SQL> commit;
executed successfully
used time: 9.266(ms). Execute id is 603.
SQL> select * from tb_test01;
LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
6 6 name6
7 7 name7
7 rows got
这个时候启动原来的主库守护进程
[root@host135 soft]# systemctl start DmWatcherServiceGRP1
show
2022-07-26 15:32:23
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GRP1 453332 TRUE MANUAL FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.134 52141 2022-07-26 15:32:23 GLOBAL VALID OPEN SLNNGKBAK OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.134 5236 OK SLNNGKBAK OPEN PRIMARY 0 0 REALTIME VALID 384114 440911 384114 440912 NONE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.135 52141 2022-07-26 15:32:23 GLOBAL VALID OPEN SLNNGK OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.135 5236 OK SLNNGK OPEN STANDBY 0 0 REALTIME VALID 383954 440910 383954 440910 NONE
DATABASE(SLNNGK) APPLY INFO FROM (SLNNGKBAK), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[384113, 384113, 384114], (RLSN, SLSN, KLSN)[440910, 440910, 440911], N_TSK[0], TSK_MEM_USE[512]
REDO_LSN_ARR: (440910)
这个时候原主库启动了,加入到集群中的角色变成了备库,查看下数据同步情况
192.168.1.135
su - dmdba
[dmdba@host135 ~]$ disql hxl/dameng123
SQL> select * from tb_test01;
LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
6 6 name6
7 7 name7
7 rows got
used time: 6.329(ms). Execute id is 0
可以看到数据同步过来的.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
2021-07-26 redis批量删除报错误CROSSSLOT Keys in request don’t hash to the same slot
2021-07-26 dataguard如何查看延迟
2019-07-26 golang获取rds备份集
2019-07-26 opentsdb安装部署(rpm方式)