PostgreSQL 17: Convert Streaming Replica to Logical Replication with pg_createsubscriber

Aakash M
Nov 13, 2024
7
Mins to Read
All

PostgreSQL 17 was released just a month ago, bringing many new features and improvements. We've already highlighted these enhancements in our Mydbops Monthly Webinar, the Open Source India (OSI) 2024 Event, and through various blog posts. During the OSI 2024, we had the opportunity to engage with numerous tech enthusiasts and students, many of whom asked questions about logical replication—particularly the exciting enhancements introduced in PostgreSQL 17.

In response to this interest, I've decided to write a detailed blog post focusing on each new logical replication feature. In this blog post, we will dive into pg_createsubscriber, a command-line utility that simplifies the setup of your logical replication server, making it ready in just seconds.

Current Problem with Logical Replication Setup

  • Until Postgres 16, to build a logical replication, we needed to configure a publication on the source and a subscription on the destination.
  • Once configured, the initial data is copied from the source to the destination.
  • In the entire logical replication setup, this initial data sync is a time-consuming process for large databases.
  • Additionally, it requires retaining WAL files needed to perform Continuous Data Changes (CDC).
  • As the initial data sync process takes time, more WAL files accumulate and need to be retained.
  • This can lead to reaching the storage limit on the source server and requires more WAL files to be applied for CDC.
  • Therefore, it is crucial to keep the initial sync time as short as possible.
  • How pg_createsubscriber Addresses These Issues

    Our problem statement is clear: we need to reduce the initial sync time while building logical replication. pg_createsubscriber helps by quickly converting an existing streaming replica into a logical replication. This allows us to skip the initial data synchronization process and save time.

    This raises a couple of new questions:

    • What if I do not have any streaming replicas available?
    • Is building a streaming replica faster than logical replication?

    To answer these, we will test the initial data transfer speed between streaming replication (pg_basebackup) and logical replication. For this demonstration, I’ve set up two servers (primary and replica) in our lab environment, with a data size of 30 GB. Now, we’ll stream this data from the primary to the replica using both methods:

    Streaming Replication:

    pg_basebackup: initiating base backup, waiting for checkpoint to complete
    pg_basebackup: checkpoint completed
    pg_basebackup: write-ahead log start point: E/BA000028 on timeline 1
    pg_basebackup: starting background WAL receiver
    pg_basebackup: created replication slot "slot1"
    32340756/32340756 kB (100%), 1/1 tablespace
    pg_basebackup: write-ahead log end point: E/BA000158
    pg_basebackup: waiting for background process to finish streaming ...
    pg_basebackup: syncing data to disk ...
    pg_basebackup: renaming backup_manifest.tmp to backup_manifest
    pg_basebackup: base backup completed

    It took 5 minutes to complete the initial data transfer.

    Logical Replication:

    2024-11-01 10:10:17.524 UTC [19076] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employers_1" has started
    2024-11-01 10:10:17.535 UTC [19077] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employees" has started
    2024-11-01 10:10:17.551 UTC [19078] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employers" has started
    2024-11-01 10:10:17.568 UTC [19079] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employees_1" has started
    
    
    2024-11-01 10:34:45.967 UTC [19079] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employees_1" has finished
    2024-11-01 10:34:51.169 UTC [19076] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employers_1" has finished
    2024-11-01 10:34:53.307 UTC [19078] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employers" has finished
    2024-11-01 10:34:54.024 UTC [19077] LOG:  logical replication table synchronization worker for subscription "my_sub", table "employees" has finished

    It took 24 minutes to complete data synchronization in logical replication.

    This demo clearly demonstrates that initial data synchronization can be time-consuming in logical replication. Thus, the answer to our earlier question is straightforward: it's often more efficient to use streaming replication for the initial synchronization and then transition to logical replication as needed. This is where pg_createsubscriber comes into play, facilitating the conversion from streaming replication to logical replication with ease. However, this approach may not be suitable if you plan to replicate only a few tables, as streaming replicas contain the entire dataset from the primary.

    pg_createsubscriber

    Let’s explore how to convert an existing streaming replica into a logical replica. For this demonstration, we'll assume that streaming replication is already configured and running; we won’t cover the initial setup steps here.

    Server Details:

    Server Role Server IP
    Primary 172.31.0.200
    Replica 172.31.8.22

    First, let's verify that the replica is in recovery mode:

    replica=# select pg_is_in_recovery();
     pg_is_in_recovery
    -------------------
     t
    (1 row)
    
    replica=#

    Both primary and replica servers are in sync, so we’re ready to start the conversion process. The pg_createsubscriber utility should be used only on the replica server. Use the --dry-run option to check feasibility before proceeding.

    postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_createsubscriber --database=test --pgdata=/var/lib/postgresql/17/main --subscriber-port=5432 --publisher-server='user=replicator password=replicator host=172.31.0.200'  --subscriber-username=postgres --publication=demo_pub1 --subscription=demo_sub1 --dry-run
    pg_createsubscriber: error: standby server is running
    pg_createsubscriber: hint: Stop the standby server and try again.
    postgres@ip-172-31-8-22:~$

    We received an error indicating that the standby server must be stopped first. Let's stop the server and try again:

    postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/17/main status
    pg_ctl: server is running (PID: 19432)
    /usr/lib/postgresql/17/bin/postgres "-D" "/var/lib/postgresql/17/main" "-c" "config_file=/etc/postgresql/17/main/postgresql.conf"
    postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/17/main stop
    waiting for server to shut down.... done
    server stopped
    postgres@ip-172-31-8-22:~$

    Once stopped, the dry run was successfully executed as shown below.

    postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_createsubscriber --database=test --pgdata=/var/lib/postgresql/17/main --subscriber-port=5432 --publisher-server='user=replicator password=replicator host=172.31.0.200'  --subscriber-username=postgres --publication=demo_pub1 --subscription=demo_sub1 --dry-run
    2024-11-01 11:43:14.484 UTC [19864] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
    2024-11-01 11:43:14.488 UTC [19864] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.5432"
    2024-11-01 11:43:14.504 UTC [19867] LOG:  database system was shut down in recovery at 2024-11-01 11:41:43 UTC
    2024-11-01 11:43:14.504 UTC [19867] LOG:  entering standby mode
    2024-11-01 11:43:14.516 UTC [19867] LOG:  redo starts at E/C34234C0
    2024-11-01 11:43:14.516 UTC [19867] LOG:  consistent recovery state reached at E/C3423600
    2024-11-01 11:43:14.516 UTC [19864] LOG:  database system is ready to accept read-only connections
    2024-11-01 11:43:14.516 UTC [19867] LOG:  invalid record length at E/C3423600: expected at least 24, got 0
    2024-11-01 11:43:14.567 UTC [19868] LOG:  started streaming WAL from primary at E/C3000000 on timeline 1
    2024-11-01 11:43:14.781 UTC [19864] LOG:  received fast shutdown request
    2024-11-01 11:43:14.784 UTC [19864] LOG:  aborting any active transactions
    2024-11-01 11:43:14.784 UTC [19868] FATAL:  terminating walreceiver process due to administrator command
    2024-11-01 11:43:14.800 UTC [19865] LOG:  shutting down
    2024-11-01 11:43:14.808 UTC [19864] LOG:  database system is shut down
    2024-11-01 11:43:15.043 UTC [19876] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
    2024-11-01 11:43:15.050 UTC [19876] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.5432"
    2024-11-01 11:43:15.078 UTC [19879] LOG:  database system was shut down in recovery at 2024-11-01 11:43:14 UTC
    2024-11-01 11:43:15.079 UTC [19879] LOG:  entering standby mode
    2024-11-01 11:43:15.086 UTC [19879] LOG:  redo starts at E/C34234C0
    2024-11-01 11:43:15.086 UTC [19879] LOG:  consistent recovery state reached at E/C3423600
    2024-11-01 11:43:15.086 UTC [19879] LOG:  invalid record length at E/C3423600: expected at least 24, got 0
    2024-11-01 11:43:15.086 UTC [19876] LOG:  database system is ready to accept read-only connections
    2024-11-01 11:43:15.126 UTC [19880] LOG:  started streaming WAL from primary at E/C3000000 on timeline 1
    2024-11-01 11:43:15.282 UTC [19876] LOG:  received fast shutdown request
    2024-11-01 11:43:15.289 UTC [19876] LOG:  aborting any active transactions
    2024-11-01 11:43:15.289 UTC [19880] FATAL:  terminating walreceiver process due to administrator command
    2024-11-01 11:43:15.302 UTC [19877] LOG:  shutting down
    2024-11-01 11:43:15.309 UTC [19876] LOG:  database system is shut down
    postgres@ip-172-31-8-22:~$ echo $?
    0
    postgres@ip-172-31-8-22:~$

    Now, let's do the actual run by removing the dry-run option.

    postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_createsubscriber --database=test --pgdata=/var/lib/postgresql/17/main --subscriber-port=5432 --publisher-server='user=replicator password=replicator host=172.31.0.200'  --subscriber-username=postgres --publication=demo_pub1 --subscription=demo_sub1
    2024-11-01 11:45:31.957 UTC [19903] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
    2024-11-01 11:45:31.966 UTC [19903] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.5432"
    2024-11-01 11:45:31.976 UTC [19906] LOG:  database system was shut down in recovery at 2024-11-01 11:43:15 UTC
    2024-11-01 11:45:31.976 UTC [19906] LOG:  entering standby mode
    2024-11-01 11:45:31.984 UTC [19906] LOG:  redo starts at E/C34234C0
    2024-11-01 11:45:31.984 UTC [19906] LOG:  consistent recovery state reached at E/C3423600
    2024-11-01 11:45:31.984 UTC [19903] LOG:  database system is ready to accept read-only connections
    2024-11-01 11:45:31.984 UTC [19906] LOG:  invalid record length at E/C3423600: expected at least 24, got 0
    2024-11-01 11:45:32.012 UTC [19907] LOG:  started streaming WAL from primary at E/C3000000 on timeline 1
    2024-11-01 11:45:32.092 UTC [19903] LOG:  received fast shutdown request
    2024-11-01 11:45:32.095 UTC [19903] LOG:  aborting any active transactions
    2024-11-01 11:45:32.095 UTC [19907] FATAL:  terminating walreceiver process due to administrator command
    2024-11-01 11:45:32.103 UTC [19904] LOG:  shutting down
    2024-11-01 11:45:32.111 UTC [19903] LOG:  database system is shut down
    2024-11-01 11:45:32.367 UTC [19915] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
    2024-11-01 11:45:32.371 UTC [19915] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.5432"
    2024-11-01 11:45:32.381 UTC [19918] LOG:  database system was shut down in recovery at 2024-11-01 11:45:32 UTC
    2024-11-01 11:45:32.382 UTC [19918] LOG:  entering standby mode
    2024-11-01 11:45:32.387 UTC [19918] LOG:  redo starts at E/C34234C0
    2024-11-01 11:45:32.388 UTC [19918] LOG:  consistent recovery state reached at E/C3423600
    2024-11-01 11:45:32.388 UTC [19915] LOG:  database system is ready to accept read-only connections
    2024-11-01 11:45:32.388 UTC [19918] LOG:  invalid record length at E/C3423600: expected at least 24, got 0
    2024-11-01 11:45:32.414 UTC [19919] LOG:  started streaming WAL from primary at E/C3000000 on timeline 1
    2024-11-01 11:45:32.468 UTC [19918] LOG:  recovery stopping after WAL location (LSN) "E/C3423C28"
    2024-11-01 11:45:32.468 UTC [19918] LOG:  redo done at E/C3423C28 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.08 s
    2024-11-01 11:45:32.468 UTC [19918] LOG:  last completed transaction was at log time 2024-11-01 11:45:32.225856+00
    2024-11-01 11:45:32.468 UTC [19919] FATAL:  terminating walreceiver process due to administrator command
    2024-11-01 11:45:32.473 UTC [19918] LOG:  selected new timeline ID: 2
    2024-11-01 11:45:32.561 UTC [19918] LOG:  archive recovery complete
    2024-11-01 11:45:32.564 UTC [19916] LOG:  checkpoint starting: end-of-recovery immediate wait
    2024-11-01 11:45:32.579 UTC [19916] LOG:  checkpoint complete: wrote 11 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.004 s, total=0.018 s; sync files=8, longest=0.002 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=E/C3423C60, redo lsn=E/C3423C60
    2024-11-01 11:45:32.589 UTC [19915] LOG:  database system is ready to accept connections
    2024-11-01 11:45:32.593 UTC [19923] WARNING:  "archive_mode" enabled, yet archiving is not configured
    2024-11-01 11:45:32.593 UTC [19923] DETAIL:  archive_command is not set.
    2024-11-01 11:45:33.554 UTC [19915] LOG:  received fast shutdown request
    2024-11-01 11:45:33.559 UTC [19915] LOG:  aborting any active transactions
    2024-11-01 11:45:33.569 UTC [19916] LOG:  shutting down
    2024-11-01 11:45:33.615 UTC [19923] WARNING:  "archive_mode" enabled, yet archiving is not configured
    2024-11-01 11:45:33.615 UTC [19923] DETAIL:  archive_command is not set.
    2024-11-01 11:45:33.618 UTC [19916] LOG:  checkpoint starting: shutdown immediate
    2024-11-01 11:45:33.635 UTC [19916] LOG:  checkpoint complete: wrote 16 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.004 s, sync=0.006 s, total=0.021 s; sync files=11, longest=0.003 s, average=0.001 s; distance=12144 kB, estimate=12144 kB; lsn=E/C4000028, redo lsn=E/C4000028

    Once it has been successfully executed, you can start the replica server. This will print the details of the subscription that was created.

    postgres@ip-172-31-8-22:~$ /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/17/main start
    waiting for server to start....2024-11-01 11:47:14.252 UTC [19941] LOG:  starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
    2024-11-01 11:47:14.253 UTC [19941] LOG:  listening on IPv4 address "0.0.0.0", port 5432
    2024-11-01 11:47:14.253 UTC [19941] LOG:  listening on IPv6 address "::", port 5432
    2024-11-01 11:47:14.261 UTC [19941] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
    2024-11-01 11:47:14.269 UTC [19944] LOG:  database system was shut down at 2024-11-01 11:45:33 UTC
    2024-11-01 11:47:14.276 UTC [19944] LOG:  recovered replication state of node 1 to E/C3423C28
    2024-11-01 11:47:14.285 UTC [19941] LOG:  database system is ready to accept connections
    2024-11-01 11:47:14.287 UTC [19947] WARNING:  "archive_mode" enabled, yet archiving is not configured
    2024-11-01 11:47:14.287 UTC [19947] DETAIL:  archive_command is not set.
     done
    server started
    postgres@ip-172-31-8-22:~$ 2024-11-01 11:47:14.309 UTC [19949] LOG:  logical replication apply worker for subscription "demo_sub1" has started

    Verifying the publication and subscription details in the source and destination, respectively:

    primary=# select * from pg_publication;
    
    -[ RECORD 1 ]+----------
    oid          | 16432
    pubname      | demo_pub1
    pubowner     | 16428
    puballtables | t
    pubinsert    | t
    pubupdate    | t
    pubdelete    | t
    pubtruncate  | t
    pubviaroot   | f
    replica=# select * from pg_subscription;
    -[ RECORD 1 ]-------+------------------------------------------------------------------
    oid                 | 24623
    subdbid             | 16386
    subskiplsn          | 0/0
    subname             | demo_sub1
    subowner            | 10
    subenabled          | t
    subbinary           | f
    substream           | f
    subtwophasestate    | d
    subdisableonerr     | f
    subpasswordrequired | t
    subrunasowner       | f
    subfailover         | f
    subconninfo         | user=replicator password=replicator host=172.31.0.200 dbname=test
    subslotname         | demo_sub1
    subsynccommit       | off
    subpublications     | {demo_pub1}
    suborigin           | any
    replica=# select * from employers_1 where id=456961 ;
       id   |      details
    --------+--------------------
     456961 | 0.9207200634008488
    (1 row)
    primary=# delete from employers_1 where id=456961 ;
    DELETE 1
    test=# select * from employers_1 where id=456961 ;
     id | details
    ----+---------
    (0 rows)

    We now have our logical replica ready, significantly reducing the initial synchronization time.

    Important Considerations

    There are a few important considerations to keep in mind before adopting this approach:

    • Version Compatibility: Both the source and destination PostgreSQL versions must be the same.
    • Error Handling: If the process fails, you may need to rebuild the streaming replica from scratch. Therefore, plan carefully and make use of the dry-run option.

    Overall, this new approach simplifies the replication process and introduces an exciting feature in PostgreSQL 17. By leveraging this method, you can achieve faster setup times and more reliable replication.

    Prefer listening? Check out our podcast episode on PostgreSQL 17's logical replication enhancements!

    If you need expert assistance with PostgreSQL replication, optimization, or troubleshooting, Mydbops offers PostgreSQL consulting and managed services to ensure seamless performance and scalability for your database systems.

    No items found.

    About the Author

    Aakash M

    PostgreSQL Expert, Mydbops

    Subscribe Now!

    Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.