KingbaseES RAC 集群案例之---在线缩容
案例说明:
在线执行KingbaseES RAC集群的缩容。对于集群动态缩容,只支持从最大节点开始缩,比如集群列表1,2,3,4,可以缩4或3,4节点;缩容前需要保证对应节点上的cluster manager服务已经退出。
集群版本:
test=# select version();
version
---------------------
KingbaseES V008R006
(1 row)
集群架构:
操作系统:
[root@node210 KingbaseHA]# cat /etc/os-release
NAME="openEuler"
VERSION="20.03 (LTS-SP4)"
ID="openEuler"
VERSION_ID="20.03"
PRETTY_NAME="openEuler 22.03 (LTS-SP4)"
ANSI_COLOR="0;31"
一、集群状态信息
如下所示,集群原为3节点架构:
test=# select sys_rac_nodelist;
sys_rac_nodelist
-------------------------------------------
(1,NODESTATE_MEMBER_ACTIVE,192.168.1.208)
(2,NODESTATE_MEMBER_ACTIVE,192.168.1.209)
(3,NODESTATE_MEMBER_ACTIVE,192.168.1.210)
(3 rows)
二、执行缩容操作(最大节点)
1、停止缩容节点cluster服务
[root@node210 KingbaseHA]# ./cluster_manager.sh stop
Signaling Pacemaker Cluster Manager to terminate[ OK ]
Waiting for cluster services to unload..........................[ OK ]
Signaling Qdisk Fenced daemon (qdisk-fenced) to terminate: [ OK ]
Waiting for qdisk-fenced services to unload:..[ OK ]
Signaling Corosync Qdevice daemon (corosync-qdevice) to terminate: [ OK ]
Waiting for corosync-qdevice services to unload:..[ OK ]
Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ]
Waiting for corosync services to unload:..[ OK ]
2、查看当前集群状态
[root@node209 KingbaseHA]# crm status
Cluster Summary:
* Stack: corosync
* Current DC: node208 Pacemaker (Kingbase) V008R006B1108 (2.0.3.0.0 4b1f869f0f:1268c00dfa83) - partition with quorum
* Last updated: Fri Jan 10 18:00:01 2025
* Last change: Wed Jan 8 17:55:45 2025 by root via cibadmin on node208
* 3 nodes configured
* 12 resource instances configured
Node List:
* Online: [ node208 node209 ]
* OFFLINE: [ node210 ]
Full List of Resources:
* fence_qdisk_0 (stonith:fence_qdisk): Started node209
* fence_qdisk_1 (stonith:fence_qdisk): Started node208
* Clone Set: clone-dlm [dlm]:
* Started: [ node208 node209 ]
* Stopped: [ node210 ]
* Clone Set: clone-gfs2 [gfs2]:
* Started: [ node208 node209 ]
* Stopped: [ node210 ]
* Clone Set: clone-DB [DB]:
* Started: [ node208 node209 ]
* Stopped: [ node210 ]
* fence_qdisk_2 (stonith:fence_qdisk): Started node208
如下所示,缩容节点的实例状态为Dead:
test=# select * from sys_rac_nodelist;
id | state | host
----+-------------------------+---------------
1 | NODESTATE_MEMBER_ACTIVE | 192.168.1.208
2 | NODESTATE_MEMBER_ACTIVE | 192.168.1.209
3 | NODESTATE_DEAD | 192.168.1.210
(3 rows)
3、关闭缩容节点fence_disk资源(保留节点执行)
[root@node209 KingbaseHA]# crm resource stop fence_qdisk_2
如下所示,缩容节点对应的fence_disk_2被关闭:
[root@node209 KingbaseHA]# crm status
Cluster Summary:
* Stack: corosync
* Current DC: node208 Pacemaker (Kingbase) V008R006B1108 (2.0.3.0.0 4b1f869f0f:1268c00dfa83) - partition with quorum
* Last updated: Fri Jan 10 18:02:04 2025
* Last change: Fri Jan 10 18:01:40 2025 by root via cibadmin on node209
* 3 nodes configured
* 12 resource instances configured (1 DISABLED)
Node List:
* Online: [ node208 node209 ]
* OFFLINE: [ node210 ]
Full List of Resources:
* fence_qdisk_0 (stonith:fence_qdisk): Started node209
* fence_qdisk_1 (stonith:fence_qdisk): Started node208
* Clone Set: clone-dlm [dlm]:
* Started: [ node208 node209 ]
* Stopped: [ node210 ]
* Clone Set: clone-gfs2 [gfs2]:
* Started: [ node208 node209 ]
* Stopped: [ node210 ]
* Clone Set: clone-DB [DB]:
* Started: [ node208 node209 ]
* Stopped: [ node210 ]
* fence_qdisk_2 (stonith:fence_qdisk): Stopped (disabled)
**4、编辑集群资源配置(保留节点执行) **
1)原集群资源配置
[root@node208 ~]# crm configure edit
node 1: node208
node 2: node209
node 3: node210
........
location fence_qdisk_0-on-node209 fence_qdisk_0 1800: node209
location fence_qdisk_0-on-node210 fence_qdisk_0 1800: node210
location fence_qdisk_1-on-node208 fence_qdisk_1 1800: node208
location fence_qdisk_1-on-node210 fence_qdisk_1 1800: node210
location fence_qdisk_2-on-node208 fence_qdisk_1 1800: node208
location fence_qdisk_2-on-node209 fence_qdisk_1 1800: node209
......
更新为:
node 1: node208
node 2: node209
........
location fence_qdisk_0-on-node209 fence_qdisk_0 1800: node209
location fence_qdisk_1-on-node208 fence_qdisk_1 1800: node208
5、编辑corosync.conf
如下所示:更改期望投票节点数 expected_votes(2*节点数-1)、投票节点数votes(节点总数-1),并删除节点信息:
[root@node209 corosync]# cat corosync.conf
totem {
version: 2
cluster_name: krac
token: 12000
token_retransmits_before_loss_const: 12
join: 10000
crypto_hash: none
crypto_cipher: none
interface {
knet_ping_interval: 1500
knet_ping_timeout: 6000
}
}
quorum {
provider: corosync_votequorum
expected_votes: 5
device {
timeout: 60000
sync_timeout: 70000
master_wins: 1
votes: 2
model: disk
disk {
debug: 0
interval: 1000
tko: 30
tko_up: 2
upgrade_wait: 1
master_wait: 3
label: krac
io_timeout: 1
fence_timeout: 50000
enable_qdisk_fence: 1
watchdog_dev: /dev/watchdog
watchdog_timeout: 30
}
heuristics {
mode: off
interval: 1000
timeout: 10000
exec_ping: /bin/ping -q -c 1 192.168.4.1
}
}
}
logging {
debug: off
to_logfile: yes
logfile: /opt/KingbaseHA/corosync/var/log/cluster/corosync.log
logger_subsys {
subsys: QDEVICE
debug: off
}
}
nodelist {
node {
ring0_addr:node208
nodeid:1
}
node {
ring0_addr:node209
nodeid:2
}
node {
ring0_addr:node210
nodeid:3
}
}
# 更新后:
[root@node209 corosync]# cat corosync.conf
totem {
version: 2
cluster_name: krac
token: 12000
token_retransmits_before_loss_const: 12
join: 10000
crypto_hash: none
crypto_cipher: none
interface {
knet_ping_interval: 1500
knet_ping_timeout: 6000
}
}
quorum {
provider: corosync_votequorum
expected_votes: 3
device {
timeout: 60000
sync_timeout: 70000
master_wins: 1
votes: 1
model: disk
disk {
debug: 0
interval: 1000
tko: 30
tko_up: 2
upgrade_wait: 1
master_wait: 3
label: krac
io_timeout: 1
fence_timeout: 50000
enable_qdisk_fence: 1
watchdog_dev: /dev/watchdog
watchdog_timeout: 30
}
heuristics {
mode: off
interval: 1000
timeout: 10000
exec_ping: /bin/ping -q -c 1 192.168.4.1
}
}
}
logging {
debug: off
to_logfile: yes
logfile: /opt/KingbaseHA/corosync/var/log/cluster/corosync.log
logger_subsys {
subsys: QDEVICE
debug: off
}
}
nodelist {
node {
ring0_addr:node208
nodeid:1
}
node {
ring0_addr:node209
nodeid:2
}
}
同步corosync配置:
[root@node209 corosync]# scp corosync.conf node208:`pwd`
[root@node209 corosync]# scp corosync.conf node210:`pwd`
# 执行同步:
[root@node209 corosync]# corosync-cfgtool -R
Reloading corosync.conf...
Done
查看集群状态:
[root@node209 corosync]# crm status
Cluster Summary:
* Stack: corosync
* Current DC: node208 Pacemaker (Kingbase) V008R006B1108 (2.0.3.0.0 4b1f869f0f:1268c00dfa83) - partition with quorum
* Last updated: Fri Jan 10 18:13:54 2025
* Last change: Fri Jan 10 18:04:11 2025 by root via cibadmin on node209
* 2 nodes configured
* 9 resource instances configured (1 DISABLED)
Node List:
* Online: [ node208 node209 ]
Full List of Resources:
* fence_qdisk_0 (stonith:fence_qdisk): Started node209
* fence_qdisk_1 (stonith:fence_qdisk): Started node208
* Clone Set: clone-dlm [dlm]:
* Started: [ node208 node209 ]
* Clone Set: clone-gfs2 [gfs2]:
* Started: [ node208 node209 ]
* Clone Set: clone-DB [DB]:
* Started: [ node208 node209 ]
* fence_qdisk_2 (stonith:fence_qdisk): Stopped (disabled)
6、查看集群实例状态信息
如下所示,查看集群实例状态信息,如果缩容实例信息仍然存在,需要重启保留节点的cluster服务后,缩容节点实例信息将被清理:
test=# select * from sys_rac_nodelist;
id | state | host
----+-------------------------+---------------
1 | NODESTATE_MEMBER_ACTIVE | 192.168.1.208
2 | NODESTATE_MEMBER_ACTIVE | 192.168.1.209
3 | NODESTATE_DEAD | 192.168.1.210
(3 rows)
# 重启保留节点cluster服务后:
test=# select * from sys_rac_nodelist;
id | state | host
----+-------------------------+---------------
1 | NODESTATE_MEMBER_ACTIVE | 192.168.1.208
2 | NODESTATE_MEMBER_ACTIVE | 192.168.1.209
(2 rows)
分类:
KingbaseRAC
标签:
KingbaseRAC
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」