KingbaseES V8R6 集群运维系列 -- 修改集群物理IP和VIP

案例说明:

​ 在KingbaseES V8R6的集群中,ip地址配置在repmgr.conf和kingbase.auto.conf中,如果需要修改集群的物理ip和vip,需要修改这两个配置文件。ip的修改需要停止集群服务,在修改ip前,对于生产环境要规划好停机窗口,以免影响应用的访问。

案例运行环境:

操作系统:
[kingbase@node1 bin]$ cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)
数据库环境:
KingbaseES V8R6

一、查看集群主备库状态信息

1、节点状态信息

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248 | standby |   running | node249  | default  | 100      | 4        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | primary | * running |          | default  | 100      | 4        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、查看repmgr.conf和kingbase.auto.conf配置文件信息(主备库)

[kingbase@node1 etc]$ cat repmgr.conf 
on_bmj=off
node_id=1
node_name='node248'
promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 
......
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.237/24'
......
[kingbase@node1 data]$ cat kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
.....
primary_conninfo = 'user=system connect_timeout=10 host=192.168.7.239 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
......

3、查看系统 ip 信息

主库:

[kingbase@node2 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.7.238   node1
192.168.7.239   node2


2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
   link/ether 08:00:27:48:34:53 brd ff:ff:ff:ff:ff:ff
   inet 192.168.7.239/24 brd 192.168.7.255 scope global enp0s3
      valid_lft forever preferred_lft forever
   inet 192.168.7.237/24 scope global secondary enp0s3:3
      valid_lft forever preferred_lft forever

备库:

[kingbase@node1 bin]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.7.238   node1
192.168.7.239   node2

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
   link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff
   inet 192.168.7.238/24 brd 192.168.7.255 scope global enp0s3

二、修改系统IP并应用新的物理IP
1、关闭集群(cluster和db)

[kingbase@node2 bin]$ ./sys_monitor.sh stop
2021-03-01 12:19:00 Ready to stop all DB ...
Service process "node_export" was killed at process 4629
.......
2021-03-01 12:19:11 Done.

2、修改系统IP
主备库修改:

[root@node2 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.7.248   node1
192.168.7.249   node2

主库:

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:48:34:53 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.249/24 brd 192.168.7.255 scope global enp0s3
       valid_lft forever preferred_lft forever

备库:

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.248/24 brd 192.168.7.255 scope global enp0s3
       valid_lft forever preferred_lft forever

三、修改repmgr.conf和kingbase.auto.conf配置文件(主备库所有node)

--- 修改配置文件中的节点的ip和vip

备库:
[kingbase@node1 etc]$ cat repmgr.conf 
on_bmj=off
node_id=1
node_name='node249'
promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
......
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.240/24'
net_device='enp0s3'
......

主库:
[kingbase@node2 etc]$ cat repmgr.conf
on_bmj=off
node_id=2
node_name=node248
promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
......
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.240/24'
net_device='enp0s3'
......
备库:
[kingbase@node1 data]$ cat kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
......
primary_conninfo = 'user=system connect_timeout=10 host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
......

主库:
[kingbase@node2 data]$ cat kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
......
primary_conninfo = 'user=system connect_timeout=10 host=192.168.7.249 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
......

四、启动主备库数据库服务

[kingbase@node2 bin]$ ./sys_ctl start -D ../data
.......
server started

五、注册主库到集群

1、注册primary到集群

[kingbase@node2 bin]$ ./repmgr primary register -F
INFO: connecting to primary database...
INFO: "repmgr" extension is already installed
NOTICE: PING 192.168.7.240 (192.168.7.240) 56(84) bytes of data.

--- 192.168.7.240 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms


WARNING: ping host"192.168.7.240" failed
DETAIL: average RTT value is not greater than zero
NOTICE: node (ID: 2) acquire the virtual ip 192.168.7.240/24 success
NOTICE: primary node record (ID: 2) updated

2、查看集群节点状态

[`kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+---------------+----------+----------+----------+-------
 1  | node248 | standby | ? unreachable | node249  | default  | 100      | ?        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | primary | * running     |          | default  | 100      | 4        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected
  - unable to connect to node "node248" (ID: 1)
  - node "node248" (ID: 1) is registered as an active standby but is unreachable

六、注册备库到集群

1、直接注册备库到集群会出现无法连接的故障(需要关闭备库数据库服务)

[kingbase@node1 bin]$ ./repmgr standby register -h 192.168.7.249 -d esrep -U esrep -W -F
WARNING: following problems with command line parameters detected:
--no-wait will be ignored when executing STANDBY REGISTER
INFO: connecting to local node "node248" (ID: 1)
WARNING: database connection parameters not required when the standby to be registered is running
DETAIL: repmgr uses the "conninfo" parameter in "repmgr.conf" to connect to the standby
INFO: connecting to primary database
ERROR: connection to database failed
DETAIL: 
could not connect to server: No route to host
      Is the server running on host "192.168.7.239" and accepting
      TCP/IP connections on port 54321?

DETAIL: attempted to connect using:
user=esrep connect_timeout=10 dbname=esrep host=192.168.7.239 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr
ERROR: connection to database failed
DETAIL: 
could not connect to server: No route to host
      Is the server running on host "192.168.7.238" and accepting
      TCP/IP connections on port 54321?

DETAIL: attempted to connect using:
user=esrep connect_timeout=10 dbname=esrep host=192.168.7.238 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr
ERROR: unable to connect to the primary database
HINT: a primary node must be configured before registering a standby node

2、关闭备库数据库服务后将备库节点重新加入到集群

1)关闭数据库服务

[kingbase@node1 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down.... done
server stopped

2)将备库节点重新加入到集群

[kingbase@node1 bin]$ ./repmgr node rejoin -h 192.168.7.249 -U esrep -d esrep
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 1/EEFF5AB0, rejoin target lsn is 1/EEFF7E78
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: begin to start server at 2021-03-01 12:18:21.290629
NOTICE: starting server using "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl  -w -t 90 -D '/home/kingbase/cluster/R6HA/KHA/kingbase/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile start"
NOTICE: start server finish at 2021-03-01 12:18:21.906449
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node248 | standby |   running | node249  | default  | 100      | 4        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | primary | * running |          | default  | 100      | 4        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3)注册standby到集群

[kingbase@node1 bin]$ ./repmgr standby register -h 192.168.7.249 -U esrep -d esrep -F
INFO: connecting to local node "node248" (ID: 1)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "node248" (ID: 1) successfully registered

3、主库查看集群状态和主备流复制状态

1)查看集群节点状态

[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node248 | standby |   running | node249  | default  | 100      | 4        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | primary | * running |          | default  | 100      | 4        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2)查看主备流复制状态

 test=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_st
art         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag
 | replay_lag | sync_priority | sync_state |          reply_time           
------+----------+---------+------------------+---------------+-----------------+
 4939 |    16384 | esrep   | node248          | 192.168.7.248 |                 |       44492 | 2021-03-01 12:17:4
1.036448+08 |              | streaming | 1/EEFF7FC0 | 1/EEFF7FC0 | 1/EEFF7FC0 | 1/EEFF7FC0 |           |          
 |            |             1 | quorum     | 2021-03-01 12:20:49.634460+08
(1 row)

4、启动主备库repmgrd服务

[kingbase@node2 bin]$ ./repmgrd -d
[2021-03-01 12:32:27] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"

七、重启集群服务验证

1、通过sys_monitor.sh启动集群

[kingbase@node2 bin]$ ./sys_monitor.sh restart
2021-03-01 12:20:29 Ready to stop all DB ...
......
[2021-03-01 12:21:02] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"

2021-03-01 12:21:03 repmgrd on "[192.168.7.249]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+------+---------+--------------------
 1  | node248 | standby |   running | node249  | running | 5135 | no      | 1 second(s) ago    
 2  | node249 | primary | * running |          | running | 6723 | no      | n/a                
2021-03-01 12:21:10 Done.

2、查看集群节点状态

[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+--------
 1  | node248 | standby |   running | node249  | default  | 100      | 4        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | primary | * running |          | default  | 100      | 4        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3、查看主备流复制状态

[kingbase@node2 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.

test=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           
------+----------+---------+------------------+---------------+-----------------+--------
 6203 |    16384 | esrep   | node248          | 192.168.7.248 |                 |       44499 | 2021-03-01 12:20:49.781155+08 |              | streaming | 1/EEFF9090 | 1/EEFF9090 | 1/EEFF9090 | 1/EEFF9090 |           |           |            |             1 | quorum     | 2021-03-01 12:22:02.817815+08
(1 row)

八、总结

​ 集群物理IP和VIP修改,需要停止集群服务(cluster和db),将影响业务的正常运行,所以在集群部署前需要做好IP的规划,避免在后期修改给业务正常运行带来影响。

操作步骤总结:

1、查看和确定主备库后,关闭集群(cluster和db)服务。
2、修改系统ip及/etc/hosts文件中ip。
3、修改集群主备库配置文件repmgr.conf和kingbase.auto.conf中的物理ip和vip信息。
4、重启系统网络服务应用新的物理ip。
5、启动主备库数据库服务。
6、注册主库到集群。
7、关闭备库数据库服务,将备库节点重新加入到集群,注册备库到集群。
8、查看集群服务状态(cluster和db)并启动主备库repmgrd服务。
9、重启集群(sys_monitor.sh)服务验证。
posted @ 2021-07-06 18:46  KINGBASE研究院  阅读(1471)  评论(0编辑  收藏  举报