Replication Between PostgreSQL Versions Using Logical Replication
转自:https://www.percona.com/blog/2019/04/04/replication-between-postgresql-versions-using-logical-replication/
When you need to upgrade PostgreSQL, there are a variety of approaches that you can use. To avoid application downtime, then not all of the options to upgrade postgres are suitable. When avoiding downtime is essential, then you might consider using replication as a means of upgrade, and depending on your scenario, you can choose to approach this task using either logical replication or physical (streaming) replication. Here, we take a look at the difference between logical and physical replication in PostgreSQL. Then we explore how to accomplish an upgrade using logical replication in more detail, and by doing so, avoid application downtime. In a subsequent article, we’ll investigate physical replication.
We have already discussed about a few methods available to perform PostgreSQL upgrades in our previous blog posts – PostgreSQL Upgrade Using pg_dumpall and PostgreSQL Upgrade Using pg_dump/pg_restore – as part of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series. However, both of these methods involve downtime to application.
Types of logical replication
Here we’ll cover two types of replication you could implement:
- Replication between PostgreSQL 10 and 11 versions using built-in logical replication.
- Replication between PostgreSQL 9.4 or (< PG 11) to PostgreSQL 11 using an extension named pglogical .
We might opt to implement replication as a method of upgrade to minimize downtime for the application. Once all the data to date has been replicated to another PostgreSQL server on the latest version, you can simply switch your application to the new server with a minimal downtime… though of course this does depends on the complexity of your application stack.
Logical replication in PostgreSQL allows users to perform a selective replication of a few tables and open a standby for writes. Whereas physical replication in PostgreSQL is a block level replication. In this case, each database in the master is replicated to a standby, and the standby is not open for writes. Going forward, we’ll refer to physical replication as streaming replication.
With logical replication, a standby can have replication enabled from multiple masters. This could be helpful in situations where you need to replicate data from several PostgreSQL databases (OLTP) to a single PostgreSQL server for reporting and data warehousing.
One of the biggest advantages of logical over streaming replication is that logical replication allows us to replicate changes from an older version PostgreSQL to a later version. Streaming replication works only when both the master and standby are of the same major version. We’d recommend they have same minor version too for best practice.
Replication between PostgreSQL 10 and 11 versions
Starting from PostgreSQL 10, logical replication is available with the PostgreSQL source by default. So, you could easily replicate a PostgreSQL 10 database to PostgreSQL 11. Logical Replication uses a publish and subscribe model. The node that sends the changes becomes a publisher. And the node that subscribes to those changes becomes a subscriber. You may have one or more subscriptions to a publication.
Publication
Publication is a set of changes generated from a group of tables. It is referred to as a change set or replication set. Publications can only contains tables and cannot contain any other objects. DMLs on these tables can be replicated but not DDLs.
In a publication, you can choose what type of DML to replicate: INSERT or DELETE or UPDATE or ALL. By default, it is ALL. You must have a replica identity set on the table being published to replicate UPDATES and DELETES to a subscriber. A replica identity set helps in identifying the rows to be updated or deleted.
The primary key of a table is its default replica identity. You can also make a unique index with NO NULL values as a replica identity. If there is no primary key or a unique index with NO NULLs, then you can set the replica_identity to FULL. When a replica identity is set to FULL, postgres uses the entire row as a key. Of course, this may be inefficient.
You might see ERRORS if a table with no primary key and a non-default replica identity has been added to a publication after an UPDATE or a DELETE operation.
Subscription
A subscriber can subscribe to one or more publications. Before adding the subscription, you must ensure that the tables being replicated have been created in the subscriber node. In order to achieve that, you can perform a schema-only dump from publisher to subscriber node.
An example of logical replication
The following example steps work for logical replication between PostgreSQL 10 and 11 versions only.
On the publishing node, create a publication. You can either add all tables or choose to add selected tables to the publication.
1
2
3
4
5
|
-- For adding ALL Tables in Database
CREATE PUBLICATION percpub FOR ALL TABLES;
-- For adding Selected Tables in Database
CREATE PUBLICATION percpub FOR TABLE scott.employee scott.departments;
|
On the subscriber node, create a subscription that refers to the publication on the publisher node. Perform a DDL dump of the tables to the subscriber node before creating the subscription, as mentioned above,
1
2
|
$ pg_dump -h publisher_server_ip -p 5432 -d percona -Fc -s -U postgres | pg_restore -d percona -h subscriber_node_ip -p 5432 -U postgres
CREATE SUBSCRIPTION percsub CONNECTION 'host=publisher_server_ip dbname=percona user=postgres password=secret port=5432' PUBLICATION percpub;
|
The above command also copies the pre-existing data from the tables. If you want to disable the copy of the pre-existing data, you can use the following syntax. It will then only start copying the changes to the publisher after you run this command.
1
|
CREATE SUBSCRIPTION percsub CONNECTION 'host=publisher_server_ip dbname=percona user=postgres password=oracle port=5432' PUBLICATION percpub WITH (copy_data = false);
|
Monitor the replication using the following command on the publishing node.
1
2
3
|
$ psql
\x
select * from pg_stat_replication;
|
Replication between PostgreSQL 9.4 and PostgreSQL 11
Now, what about the versions that are older than PostgreSQL 10? For this purpose, there is an extension named pglogical that works for versions from 9.4 until 11. Using pglogical, you can easily replicate PostgreSQL 9.4 to PostgreSQL 11.
The following sequence of steps demonstrates a high-level procedure to setup replication between PG 9.4 and PG 11 using pglogical extension.
Step 1 : Consider pgserver_94 to be the source server with a database : percona_94 running on PostgreSQL 9.4. Create the following extensions.
1
2
3
4
|
[pgserver_94:] $psql -d percona_94 -c "CREATE EXTENSION pglogical_origin"
CREATE EXTENSION
[pgserver_94:] $psql -d percona_94 -c "CREATE EXTENSION pglogical"
CREATE EXTENSION
|
Step 2 : Now, you can go ahead and add either selected tables or all the tables in a schema or multiple schemas for replication. In the following example, you can see an error when there is no primary key on one of the tables.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
[pgserver_94:] $psql -d percona_94
psql (9.4.21)
Type "help" for help.
percona_94=# SELECT pglogical.create_node(node_name := 'provider1',dsn := 'host=192.168.0.24 port=5432 dbname=percona_94');
create_node
-------------
2976894835
(1 row)
percona_94=# SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
ERROR: table pgbench_history cannot be added to replication set default
DETAIL: table does not have PRIMARY KEY and given replication set is configured to replicate UPDATEs and/or DELETEs
HINT: Add a PRIMARY KEY to the table
percona_94=# ALTER TABLE pgbench_history ADD PRIMARY KEY (tid,aid,delta);
ALTER TABLE
percona_94=# SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
replication_set_add_all_tables
--------------------------------
t
(1 row)
|
Step 3
On the subscriber node, which is our PostgreSQL 11 database, you can run similar commands as follows.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
[pgserver_11:] $psql -d percona_11
psql (11.2)
Type "help" for help.
percona_11=# SELECT pglogical.create_node(node_name := 'subscriber1',dsn := 'host=127.0.0.1 port=5432 dbname=percona_11 password=secret');
create_node
-------------
330520249
(1 row)
percona_11=# SELECT pglogical.create_subscription(subscription_name := 'subscription1',provider_dsn := 'host=192.168.0.24 port=5432 dbname=percona_94 password=secret');
create_subscription
---------------------
1763399739
(1 row)
|
Step 4 You can then validate the replication status by querying a few tables pglogical always updates:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
percona_11=# select * from pglogical.local_sync_status;
sync_kind | sync_subid | sync_nspname | sync_relname | sync_status | sync_statuslsn
-----------+------------+--------------+------------------+-------------+----------------
f | 1763399739 | public | pgbench_accounts | r | 0/2EB7D48
f | 1763399739 | public | pgbench_history | r | 0/2EB7D48
f | 1763399739 | public | pgbench_tellers | r | 0/2EB7D48
f | 1763399739 | public | pgbench_branches | r | 0/2EB7D48
d | 1763399739 | | | r | 0/0
(5 rows)
percona_11=# select * from pglogical.subscription;
sub_id | sub_name | sub_origin | sub_target | sub_origin_if | sub_target_if | sub_enabled | sub_slot_name | sub_rep
lication_sets | sub_forward_origins | sub_apply_delay
------------+---------------+------------+------------+---------------+---------------+-------------+----------------------------------------+----------------
-----------------------+---------------------+-----------------
1763399739 | subscription1 | 2976894835 | 330520249 | 2402836775 | 2049915666 | t | pgl_percona_11_provider1_subscription1 |{default,defaul
t_insert_only,ddl_sql} | {all} | 00:00:00
(1 row)
|
Primary key selection
At step 2, you saw how all the tables of schema : public got added to a replication set by creating a primary key on the table that doesn’t currently have one. The primary key I chose may not be the right one for that table as it is just for demonstration. However, when you choose a primary key, make sure that you are selecting the right one. It needs to be always unique and use column(s) that don’t normally contain NULLs. If you don’t research primary key selection thoroughly, you could cause downtime to your application. Here’s an example error that you could encounter:
1
2
3
|
[pgserver_94:] $pgbench -c 10 -T 300 -n percona_94
Client 7 aborted in state 12: ERROR: duplicate key value violates unique constraint "pgbench_history_pkey"
DETAIL: Key (tid, aid, delta)=(7, 63268, 2491) already exists.
|
So far we have seen how you can use pglogical to create replication between an older version to a newer version PostgreSQL. After you have set up replication, you can easily switch your applications to the latest version with a lower downtime.