Thanks to Ryan for hosting PGSQL Phriday #012 who has asked to write a blog about exciting features in PostgreSQL 16.
In this blog post, we will discuss the new feature introduced in PostgreSQL 16: Bidirectional Logical Replication.
Bidirectional replication
Bidirectional replication is a two-way data exchange, where Node 1 not only receives data from Node 2 but also seamlessly sends its own data back to Node 2. This powerful feature opens up a world of possibilities for database synchronization and collaboration.
1. Consequently, any modifications performed on either of the nodes will be mirrored and implemented on both.
Before PostgreSQL 16
Prior to version 15, PostgreSQL did not offer support for these replication concepts. Attempting to implement them would lead to an endless loop, as illustrated below:
- Node 1 executes SQL and replicates it to Node 2.
- Node 2 receives the SQL and also executes it.
- The same SQL is then sent back to Node 1, resulting in an ongoing cycle.
Let's try to set it up in PostgreSQL 15.
On node1:
On node2:
On node1:
Logical replication setup is completed on both the nodes for table test1. Let's insert a record on Node 1.
Consequently, within a few seconds, the number of records began to exceed 100, and this count continued to grow indefinitely, leading to an infinite loop. In cases where the table has a primary key constraint, it would eventually trigger a unique key violation error.
Understanding the Infinite Loop in Bidirectional Logical Replication Prior to PostgreSQL 15
What happens?
- The Publisher (pub1) of Node 1 sends records to the Subscription (sub1) of Node 2.
- The Apply Worker of sub1 applies the data on Node 2.
Simultaneously,
- The Publisher (pub2) of Node 2 sends the same data to the Subscription (sub2) of Node 1.
- The Apply Worker of sub2 applies the data on Node 1.
This circular process continues indefinitely, resulting in an infinite loop.
Why it happens?
The Apply Worker lacks awareness of whether the data originates locally or comes from replication.
How is this issue resolved in PostgreSQL 16?
Locally generated data(Direct SQL Execution) will not have a Replication Origin and the data replicated from another source will have a Replication Origin. By using this information, it found a way to replicate only changes made by SQL commands and not those from replication.
From PostgreSQL 16 onwards a new parameter called origin is introduced as a subscription parameter which tells the publisher to send only changes that do not have replication origin(Own Writes Only) or send all changes(Own writes + Replicated Writes - As of PostgreSQL 15).
The origin argument in the subscription command will accept two values
- None - The subscription will request the publisher to only send changes that have no origin associated(Only the writes made on the server). This helps to avoid an infinite loop in the bidirectional replication setup.
- Any - The publisher sends changes regardless of their origin(Writes made on the server + Replicated data from other servers). The default is any and produces the behavior of an infinite loop if bidirectional replication is configured.
Let's try to set up bidirectional replication for a table in PostgreSQL 16 with the origin set to None.
On Node 1:
On Node 2:
On Node 1:
Bidirectional replication is configured. Let's write on both the nodes
On Node 1:
On Node 2:
On Node 1:
If you wish to maintain the previous behavior as of PostgreSQL 15, you can either set the 'origin' value to 'any' or leave it at its default setting when configuring the subscription.
The method described above outlines how to manage the incremental synchronization of a table in Logical Replication.
For the initial data load(Table Synchronisation) phase, we set copy_data=true and origin=NONE. By this, the initial data sync data will be copied from the publisher but it is not possible to differentiate between the local writes and replicated data.
While setting this up, we may get a warning as below as there are chances that the publisher has subscriptions and it may have data from multiple sources.
WARNING: Subscription sub1 requested copy_data with origin = NONE but might copy data that had a different origin
DETAIL: Subscribed publication pub1 is subscribing to other publications.
HINT: Verify that initial data copied from the publisher tables did not come from other origins.
It is the user's responsibility to configure with care in such cases.
In conclusion, PostgreSQL 16's bidirectional logical replication revolutionizes data synchronization. By introducing the origin parameter, it elegantly resolves the infinite loop issue, ensuring data consistency between nodes. This feature is a game-changer for high availability, Write Scalability, disaster recovery, and collaboration scenarios. While its implementation requires careful consideration, bidirectional replication opens exciting possibilities for seamless data exchange.
Check out our recent blogs here