PostgreSQL 17: Convert Streaming Replica to Logical Replication with pg

PostgreSQL 17: Convert Streaming Replica to Logical Replication with pg_createsubscriber

Nov 13, 2024

Mins to Read

All

‍PostgreSQL 17 was released just a month ago, bringing many new features and improvements. We've already highlighted these enhancements in our Mydbops Monthly Webinar, the Open Source India (OSI) 2024 Event, and through various blog posts. During the OSI 2024, we had the opportunity to engage with numerous tech enthusiasts and students, many of whom asked questions about logical replication—particularly the exciting enhancements introduced in PostgreSQL 17.

In response to this interest, I've decided to write a detailed blog post focusing on each new logical replication feature. In this blog post, we will dive into pg_createsubscriber, a command-line utility that simplifies the setup of your logical replication server, making it ready in just seconds.

Current Problem with Logical Replication Setup

Until Postgres 16, to build a logical replication, we needed to configure a publication on the source and a subscription on the destination.

Once configured, the initial data is copied from the source to the destination.

In the entire logical replication setup, this initial data sync is a time-consuming process for large databases.

Additionally, it requires retaining WAL files needed to perform Continuous Data Changes (CDC).

As the initial data sync process takes time, more WAL files accumulate and need to be retained.

This can lead to reaching the storage limit on the source server and requires more WAL files to be applied for CDC.

Therefore, it is crucial to keep the initial sync time as short as possible.

How pg_createsubscriber Addresses These Issues

Our problem statement is clear: we need to reduce the initial sync time while building logical replication. pg_createsubscriber helps by quickly converting an existing streaming replica into a logical replication. This allows us to skip the initial data synchronization process and save time.

This raises a couple of new questions:

What if I do not have any streaming replicas available?
Is building a streaming replica faster than logical replication?

To answer these, we will test the initial data transfer speed between streaming replication (pg_basebackup) and logical replication. For this demonstration, I’ve set up two servers (primary and replica) in our lab environment, with a data size of 30 GB. Now, we’ll stream this data from the primary to the replica using both methods:

Streaming Replication:

pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: E/BA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created replication slot "slot1"
32340756/32340756 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: E/BA000158
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

It took 5 minutes to complete the initial data transfer.

Logical Replication:

2024-11-01 10:10:17.524 UTC [19076] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employers_1" has started
2024-11-01 10:10:17.535 UTC [19077] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employees" has started
2024-11-01 10:10:17.551 UTC [19078] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employers" has started
2024-11-01 10:10:17.568 UTC [19079] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employees_1" has started


2024-11-01 10:34:45.967 UTC [19079] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employees_1" has finished
2024-11-01 10:34:51.169 UTC [19076] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employers_1" has finished
2024-11-01 10:34:53.307 UTC [19078] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employers" has finished
2024-11-01 10:34:54.024 UTC [19077] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employees" has finished

It took 24 minutes to complete data synchronization in logical replication.

This demo clearly demonstrates that initial data synchronization can be time-consuming in logical replication. Thus, the answer to our earlier question is straightforward: it's often more efficient to use streaming replication for the initial synchronization and then transition to logical replication as needed. This is where pg_createsubscriber comes into play, facilitating the conversion from streaming replication to logical replication with ease. However, this approach may not be suitable if you plan to replicate only a few tables, as streaming replicas contain the entire dataset from the primary.

pg_createsubscriber

Let’s explore how to convert an existing streaming replica into a logical replica. For this demonstration, we'll assume that streaming replication is already configured and running; we won’t cover the initial setup steps here.

Server Details:

Server Role	Server IP
Primary	172.31.0.200
Replica	172.31.8.22

‍

First, let's verify that the replica is in recovery mode:

replica=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

replica=#

Both primary and replica servers are in sync, so we’re ready to start the conversion process. The pg_createsubscriber utility should be used only on the replica server. Use the --dry-run option to check feasibility before proceeding.

postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_createsubscriber --database=test --pgdata=/var/lib/postgresql/17/main --subscriber-port=5432 --publisher-server='user=replicator password=replicator host=172.31.0.200'  --subscriber-username=postgres --publication=demo_pub1 --subscription=demo_sub1 --dry-run
pg_createsubscriber: error: standby server is running
pg_createsubscriber: hint: Stop the standby server and try again.
postgres@ip-172-31-8-22:~$

We received an error indicating that the standby server must be stopped first. Let's stop the server and try again:

postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/17/main status
pg_ctl: server is running (PID: 19432)
/usr/lib/postgresql/17/bin/postgres "-D" "/var/lib/postgresql/17/main" "-c" "config_file=/etc/postgresql/17/main/postgresql.conf"
postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/17/main stop
waiting for server to shut down.... done
server stopped
postgres@ip-172-31-8-22:~$

Once stopped, the dry run was successfully executed as shown below.

postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_createsubscriber --database=test --pgdata=/var/lib/postgresql/17/main --subscriber-port=5432 --publisher-server='user=replicator password=replicator host=172.31.0.200'  --subscriber-username=postgres --publication=demo_pub1 --subscription=demo_sub1 --dry-run
2024-11-01 11:43:14.484 UTC [19864] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-11-01 11:43:14.488 UTC [19864] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.5432"
2024-11-01 11:43:14.504 UTC [19867] LOG:  database system was shut down in recovery at 2024-11-01 11:41:43 UTC
2024-11-01 11:43:14.504 UTC [19867] LOG:  entering standby mode
2024-11-01 11:43:14.516 UTC [19867] LOG:  redo starts at E/C34234C0
2024-11-01 11:43:14.516 UTC [19867] LOG:  consistent recovery state reached at E/C3423600
2024-11-01 11:43:14.516 UTC [19864] LOG:  database system is ready to accept read-only connections
2024-11-01 11:43:14.516 UTC [19867] LOG:  invalid record length at E/C3423600: expected at least 24, got 0
2024-11-01 11:43:14.567 UTC [19868] LOG:  started streaming WAL from primary at E/C3000000 on timeline 1
2024-11-01 11:43:14.781 UTC [19864] LOG:  received fast shutdown request
2024-11-01 11:43:14.784 UTC [19864] LOG:  aborting any active transactions
2024-11-01 11:43:14.784 UTC [19868] FATAL:  terminating walreceiver process due to administrator command
2024-11-01 11:43:14.800 UTC [19865] LOG:  shutting down
2024-11-01 11:43:14.808 UTC [19864] LOG:  database system is shut down
2024-11-01 11:43:15.043 UTC [19876] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-11-01 11:43:15.050 UTC [19876] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.5432"
2024-11-01 11:43:15.078 UTC [19879] LOG:  database system was shut down in recovery at 2024-11-01 11:43:14 UTC
2024-11-01 11:43:15.079 UTC [19879] LOG:  entering standby mode
2024-11-01 11:43:15.086 UTC [19879] LOG:  redo starts at E/C34234C0
2024-11-01 11:43:15.086 UTC [19879] LOG:  consistent recovery state reached at E/C3423600
2024-11-01 11:43:15.086 UTC [19879] LOG:  invalid record length at E/C3423600: expected at least 24, got 0
2024-11-01 11:43:15.086 UTC [19876] LOG:  database system is ready to accept read-only connections
2024-11-01 11:43:15.126 UTC [19880] LOG:  started streaming WAL from primary at E/C3000000 on timeline 1
2024-11-01 11:43:15.282 UTC [19876] LOG:  received fast shutdown request
2024-11-01 11:43:15.289 UTC [19876] LOG:  aborting any active transactions
2024-11-01 11:43:15.289 UTC [19880] FATAL:  terminating walreceiver process due to administrator command
2024-11-01 11:43:15.302 UTC [19877] LOG:  shutting down
2024-11-01 11:43:15.309 UTC [19876] LOG:  database system is shut down
postgres@ip-172-31-8-22:~$ echo $?
0
postgres@ip-172-31-8-22:~$

Now, let's do the actual run by removing the dry-run option.

postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_createsubscriber --database=test --pgdata=/var/lib/postgresql/17/main --subscriber-port=5432 --publisher-server='user=replicator password=replicator host=172.31.0.200'  --subscriber-username=postgres --publication=demo_pub1 --subscription=demo_sub1
2024-11-01 11:45:31.957 UTC [19903] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-11-01 11:45:31.966 UTC [19903] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.5432"
2024-11-01 11:45:31.976 UTC [19906] LOG:  database system was shut down in recovery at 2024-11-01 11:43:15 UTC
2024-11-01 11:45:31.976 UTC [19906] LOG:  entering standby mode
2024-11-01 11:45:31.984 UTC [19906] LOG:  redo starts at E/C34234C0
2024-11-01 11:45:31.984 UTC [19906] LOG:  consistent recovery state reached at E/C3423600
2024-11-01 11:45:31.984 UTC [19903] LOG:  database system is ready to accept read-only connections
2024-11-01 11:45:31.984 UTC [19906] LOG:  invalid record length at E/C3423600: expected at least 24, got 0
2024-11-01 11:45:32.012 UTC [19907] LOG:  started streaming WAL from primary at E/C3000000 on timeline 1
2024-11-01 11:45:32.092 UTC [19903] LOG:  received fast shutdown request
2024-11-01 11:45:32.095 UTC [19903] LOG:  aborting any active transactions
2024-11-01 11:45:32.095 UTC [19907] FATAL:  terminating walreceiver process due to administrator command
2024-11-01 11:45:32.103 UTC [19904] LOG:  shutting down
2024-11-01 11:45:32.111 UTC [19903] LOG:  database system is shut down
2024-11-01 11:45:32.367 UTC [19915] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-11-01 11:45:32.371 UTC [19915] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.5432"
2024-11-01 11:45:32.381 UTC [19918] LOG:  database system was shut down in recovery at 2024-11-01 11:45:32 UTC
2024-11-01 11:45:32.382 UTC [19918] LOG:  entering standby mode
2024-11-01 11:45:32.387 UTC [19918] LOG:  redo starts at E/C34234C0
2024-11-01 11:45:32.388 UTC [19918] LOG:  consistent recovery state reached at E/C3423600
2024-11-01 11:45:32.388 UTC [19915] LOG:  database system is ready to accept read-only connections
2024-11-01 11:45:32.388 UTC [19918] LOG:  invalid record length at E/C3423600: expected at least 24, got 0
2024-11-01 11:45:32.414 UTC [19919] LOG:  started streaming WAL from primary at E/C3000000 on timeline 1
2024-11-01 11:45:32.468 UTC [19918] LOG:  recovery stopping after WAL location (LSN) "E/C3423C28"
2024-11-01 11:45:32.468 UTC [19918] LOG:  redo done at E/C3423C28 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.08 s
2024-11-01 11:45:32.468 UTC [19918] LOG:  last completed transaction was at log time 2024-11-01 11:45:32.225856+00
2024-11-01 11:45:32.468 UTC [19919] FATAL:  terminating walreceiver process due to administrator command
2024-11-01 11:45:32.473 UTC [19918] LOG:  selected new timeline ID: 2
2024-11-01 11:45:32.561 UTC [19918] LOG:  archive recovery complete
2024-11-01 11:45:32.564 UTC [19916] LOG:  checkpoint starting: end-of-recovery immediate wait
2024-11-01 11:45:32.579 UTC [19916] LOG:  checkpoint complete: wrote 11 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.004 s, total=0.018 s; sync files=8, longest=0.002 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=E/C3423C60, redo lsn=E/C3423C60
2024-11-01 11:45:32.589 UTC [19915] LOG:  database system is ready to accept connections
2024-11-01 11:45:32.593 UTC [19923] WARNING:  "archive_mode" enabled, yet archiving is not configured
2024-11-01 11:45:32.593 UTC [19923] DETAIL:  archive_command is not set.
2024-11-01 11:45:33.554 UTC [19915] LOG:  received fast shutdown request
2024-11-01 11:45:33.559 UTC [19915] LOG:  aborting any active transactions
2024-11-01 11:45:33.569 UTC [19916] LOG:  shutting down
2024-11-01 11:45:33.615 UTC [19923] WARNING:  "archive_mode" enabled, yet archiving is not configured
2024-11-01 11:45:33.615 UTC [19923] DETAIL:  archive_command is not set.
2024-11-01 11:45:33.618 UTC [19916] LOG:  checkpoint starting: shutdown immediate
2024-11-01 11:45:33.635 UTC [19916] LOG:  checkpoint complete: wrote 16 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.004 s, sync=0.006 s, total=0.021 s; sync files=11, longest=0.003 s, average=0.001 s; distance=12144 kB, estimate=12144 kB; lsn=E/C4000028, redo lsn=E/C4000028

Once it has been successfully executed, you can start the replica server. This will print the details of the subscription that was created.

postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/17/main start

waiting for server to start....2024-11-01 11:47:14.252 UTC [19941] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-11-01 11:47:14.253 UTC [19941] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-11-01 11:47:14.253 UTC [19941] LOG:  listening on IPv6 address "::", port 5432
2024-11-01 11:47:14.261 UTC [19941] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-11-01 11:47:14.269 UTC [19944] LOG:  database system was shut down at 2024-11-01 11:45:33 UTC
2024-11-01 11:47:14.276 UTC [19944] LOG:  recovered replication state of node 1 to E/C3423C28
2024-11-01 11:47:14.285 UTC [19941] LOG:  database system is ready to accept connections
2024-11-01 11:47:14.287 UTC [19947] WARNING:  "archive_mode" enabled, yet archiving is not configured
2024-11-01 11:47:14.287 UTC [19947] DETAIL:  archive_command is not set.
 done
server started
postgres@ip-172-31-8-22:~$ 2024-11-01 11:47:14.309 UTC [19949] LOG:  logical replication apply worker for subscription "demo_sub1" has started

Verifying the publication and subscription details in the source and destination, respectively:

primary=# select * from pg_publication;

-[ RECORD 1 ]+----------
oid          | 16432
pubname      | demo_pub1
pubowner     | 16428
puballtables | t
pubinsert    | t
pubupdate    | t
pubdelete    | t
pubtruncate  | t
pubviaroot   | f

replica=# select * from pg_subscription;
-[ RECORD 1 ]-------+------------------------------------------------------------------
oid                 | 24623
subdbid             | 16386
subskiplsn          | 0/0
subname             | demo_sub1
subowner            | 10
subenabled          | t
subbinary           | f
substream           | f
subtwophasestate    | d
subdisableonerr     | f
subpasswordrequired | t
subrunasowner       | f
subfailover         | f
subconninfo         | user=replicator password=replicator host=172.31.0.200 dbname=test
subslotname         | demo_sub1
subsynccommit       | off
subpublications     | {demo_pub1}
suborigin           | any

replica=# select * from employers_1 where id=456961 ;
   id   |      details
--------+--------------------
 456961 | 0.9207200634008488
(1 row)

primary=# delete from employers_1 where id=456961 ;
DELETE 1

test=# select * from employers_1 where id=456961 ;
 id | details
----+---------
(0 rows)

We now have our logical replica ready, significantly reducing the initial synchronization time.

Important Considerations

There are a few important considerations to keep in mind before adopting this approach:

Version Compatibility: Both the source and destination PostgreSQL versions must be the same.
Error Handling: If the process fails, you may need to rebuild the streaming replica from scratch. Therefore, plan carefully and make use of the dry-run option.

Overall, this new approach simplifies the replication process and introduces an exciting feature in PostgreSQL 17. By leveraging this method, you can achieve faster setup times and more reliable replication.

Prefer listening? Check out our podcast episode on PostgreSQL 17's logical replication enhancements!

If you need expert assistance with PostgreSQL replication, optimization, or troubleshooting, Mydbops offers PostgreSQL consulting and managed services to ensure seamless performance and scalability for your database systems.

No items found.