postgresql 高可用 etcd + patroni 之三 switchover
os: centos 7.4
postgresql: 9.6.9
etcd: 3.2.18
patroni: 1.4.4
patroni + etcd 是在一个postgrsql 开源大会上 亚信的一个哥们讲解的高可用方案。
依然是基于 postgreql stream replication。
ip规划
192.168.56.101 node1 master
192.168.56.102 node2 slave
192.168.56.103 node3 slave
手动 switchover
切换前的状态
$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster | Member | Host | Role | State | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
| pg96 | pg96_101 | 192.168.56.101 | | running | 0.0 |
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
| pg96 | pg96_103 | 192.168.56.103 | Leader | running | 0.0 |
+---------+----------+----------------+--------+---------+-----------+
执行手动切换
$ patronictl -c /usr/patroni/conf/patroni_postgresql.yml switchover
Master [pg96_103]: pg96_103
Candidate ['pg96_101', 'pg96_102'] []: pg96_101
When should the switchover take place (e.g. 2015-10-01T14:30) [now]: now
Current cluster topology
+---------+----------+----------------+--------+---------+-----------+
| Cluster | Member | Host | Role | State | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
| pg96 | pg96_101 | 192.168.56.101 | | running | 0.0 |
| pg96 | pg96_102 | 192.168.56.102 | | running | 0.0 |
| pg96 | pg96_103 | 192.168.56.103 | Leader | running | 0.0 |
+---------+----------+----------------+--------+---------+-----------+
Are you sure you want to switchover cluster pg96, demoting current master pg96_103? [y/N]: y
Switchover failed, details: 503, Switchover failed
node1 的日志如下
2018-07-11 23:26:34,635 INFO: received switchover request with leader=pg96_103 candidate=pg96_101 scheduled_at=None
2018-07-11 23:26:34,645 INFO: Got response from pg96_101 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 22:03:55.130 CST", "timeline": 3, "xlog": {"received_location": 50385856, "replayed_timestamp": "2018-07-11 22:34:29.725 CST", "paused": false, "replayed_location": 50385856}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "state": "running", "role": "replica", "server_version": 90609}
2018-07-11 23:26:39,126 INFO: Lock owner: pg96_103; I am pg96_101
2018-07-11 23:26:39,126 INFO: does not have lock
2018-07-11 23:26:39,142 INFO: no action. i am a secondary and i am following a leader
node3 的日志如下
2018-07-11 23:27:06,254 INFO: Lock owner: pg96_103; I am pg96_103
2018-07-11 23:27:06,274 INFO: Got response from pg96_101 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 17:38:41.768 CST", "timeline": 3, "xlog": {"location": 50385856}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "replication": [{"sync_state": "potential", "sync_priority": 2, "client_addr": "192.168.56.102", "state": "streaming", "application_name": "pg96_102", "usename": "replicator"}, {"sync_state": "sync", "sync_priority": 1, "client_addr": "192.168.56.101", "state": "streaming", "application_name": "pg96_101", "usename": "replicator"}], "state": "running", "role": "master", "server_version": 90609}
2018-07-11 23:27:06,364 INFO: Member pg96_101 exceeds maximum replication lag
2018-07-11 23:27:06,365 WARNING: manual failover: no healthy members found, failover is not possible
2018-07-11 23:27:06,365 INFO: Cleaning up failover key
2018-07-11 23:27:06,389 INFO: no action. i am the leader with the lock
在node3的日志输出发现 WARNING: manual failover: no healthy members found, failover is not possible
先记录下,研究明白后再补充。