MySQL 8.4 introduces an exciting feature: Tagged GTID (Global Transaction Identifier), which allows users to assign a tag (up to 33 characters) to GTIDs, providing a powerful way to organize and manage transactions. This capability is especially beneficial for complex replication setups such as InnoDB Clusters or Group Replication, where managing data flow and maintaining consistency are crucial.
This blog will cover the benefits of Tagged GTID, practical use cases, and a step-by-step guide for implementation, including examples for both native replication and group replication.
Benefits of Using Tagged GTID
GTID replication ensures each transaction is globally unique across servers. Tagged GTIDs enhance this by offering granular control over transaction replication, providing specific benefits:
- Selective Transaction Replication: Tagged GTIDs let you skip specific transactions while continuing the replication process. This is especially useful when certain operations (e.g., DELETEs or specific UPDATEs) don’t need to be replicated on all servers.
- Granular Control: They provide precise control over which transactions are applied on replicas, eliminating the need for complex replication filters and enabling cleaner, more structured replication management.
- Consistency Across Clusters: In setups like InnoDB Clusters or Group Replication, maintaining uniformity is crucial. Tagged GTIDs enable selective skipping of operations without compromising data consistency.
- Simplified Management: Tagged GTIDs simplify data replication management, avoiding the need for disabling binary logs or using intricate filters.
- Improved Data Organization: By assigning tags to different transaction types, you can trace, categorize, and audit operations more efficiently, ensuring replication aligns with specific environmental needs without disrupting the flow.
Use Cases for Tagged GTID
Selective Table Synchronization
You can tag certain operations, like large INSERTs, and choose whether to apply them to some replicas, which helps manage data load.
Conflict Resolution in Multi-Source Replication
When multiple sources feed data into a replica, Tagged GTID helps prioritize which updates to keep while ignoring others, solving conflicts easily.
Real-World Example: Skipping Delete Operations
Imagine you have a setup where you need to retain a month’s worth of data but skip any delete operations on the replica. You can tag these DELETE transactions, and the replica will ignore them while maintaining all other operations. Here’s how Tagged GTID can help:
- Tagging Deletes: You tag the DELETE operations as delete_logs. Once the replica receives the tag, it will recognize that those deletes should be skipped.
- Idempotency: Since GTIDs are idempotent, once the replica marks those operations as applied, it won’t process them again. This ensures that any deletes under this tag won’t affect the replica, keeping its data intact.
By faking that the replica has already applied these operations, you can control how the data is replicated, ensuring selective synchronization without losing any critical information.
Step-by-Step Implementation Guide
Here’s a step-by-step procedure to implement Tagged GTID for skipping delete operations on a replica:
Step 1: Validate Records on the Source Server
Before conducting any operations, it's essential to verify the record count on the primary server. For instance, if your table currently has 12,074 records and you intend to delete 5,000 of them—specifically those with IDs less than 15,000—be sure to take note of these figures. This step ensures you have a clear understanding of the data you are working with before making any changes.
Count of records in source server:
mysql> select count(id) from sbtest1 where id <15000;
+-----------+
| count(id) |
+-----------+
| 5000 |
+-----------+
1 row in set (0.03 sec)
Step 2: Configure the Replica Server
On the replica, set the GTID purged to avoid applying specific transactions. The command SET GLOBAL gtid_purged requires three key parameters:
- group_name: For Group Replication, this is the group UUID. For native replication, use the source server’s UUID.
- tag_name: The custom label for identifying the type of operation, such as 'delete_logs'.
- max_value: The maximum value can be anything and in below example, Ive given 1-99999999.
Where to run this command?
This command is run on the replica where you want to skip the tagged transactions.
SET GLOBAL gtid_purged='<group_name>:<tag_name>:<max_value>';
Example (Native Replication)
If your tag is delete_sbtest
and the group name is 65227e07-8b06-11ef-b39e-0ab01eb55715
, the command would look like:
SET GLOBAL gtid_purged='65227e07-8b06-11ef-b39e-0ab01eb55715
:delete_sbtest1:<max_value>';
mysql> set global gtid_purged='65227e07-8b06-11ef-b39e-0ab01eb55715:delete_sbtest1:1-999999999';
Query OK, 0 rows affected (0.00 sec)
mysql> show binary log status\G
*************************** 1. row ***************************
File: binlog.000005
Position: 2894856
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 65227e07-8b06-11ef-b39e-0ab01eb55715:1-12089:delete_sbtest1:1-999999999
1 row in set (0.00 sec)
Step 3: Perform the DELETE Operation on the Source Server
Now, switch to the primary database (source) and perform the DELETE operation, ensuring that the transaction is tagged:
SET gtid_next='automatic:<tag_name>';
-- Execute the delete operation
DELETE FROM <table_name> WHERE <condition>;
mysql> set gtid_next='automatic:delete_sbtest1';
Query OK, 0 rows affected (0.00 sec)
mysql> DELETE FROM sbtest1 WHERE id < 15000;
Query OK, 5000 rows affected (0.08 sec)
This will delete 5,000 records from the primary server.
Step 4: Validate Data on Both Servers
After executing the DELETE command, verify the record counts on both the primary server and the replica:
- Primary Server: The record count should reflect the deletion (i.e., 7,074 records remaining).
mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 7074 |
+----------+
1 row in set (0.00 sec)
- Replica: The record count should remain unchanged at 12,074, as the tagged DELETE operations were skipped.
mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 12074 |
+----------+
1 row in set (0.02 sec)
In the Case of Group Replication
In a group replication setup, you can prevent certain deletions from affecting all nodes. For example, you may want to delete old records from one node but keep them on the others.
On the Primary Node
SET gtid_next='automatic:archive_tag';
DELETE FROM <archive_table> WHERE created_at < NOW() - INTERVAL 1 YEAR;
On the Secondary Node
SET GLOBAL gtid_purged='<group_name>:archive_tag:<max_value>';
Advantages of Tagged GTID
- Consistency Without Complex Filters: Tagged GTIDs provide a clean and efficient alternative to using replication filters. This makes managing data replication in complex clustered environments much simpler without the need for intricate configurations.
- Flexibility Across Different Replication Types: Whether you are using native replication or group replication, tagged GTIDs allow for flexible control over how transactions are handled across different servers. This ensures that data flow can be fine-tuned based on the specific needs of your infrastructure.
- Enhanced Troubleshooting and Auditing: By tagging specific transactions, you can easily trace and track changes across different servers. This makes it easier to audit and troubleshoot specific transactions, offering improved visibility into the replication process.
- Data Integrity in Complex Setups: With the ability to skip certain operations, Tagged GTID helps ensure that critical data isn’t lost or replicated unintentionally. This becomes particularly important when dealing with complex setups involving multiple servers and databases.
- Versatile Application: Whether it's INSERTs, UPDATEs, DELETEs, or any custom transaction, Tagged GTID gives you more control over your replication strategy.
Tagged GTID in MySQL 8.4 offers a structured and flexible approach to transaction replication, particularly useful in Group Replication and InnoDB Clusters. Whether you need to selectively skip operations or improve the organization of transactions, Tagged GTIDs provide a clean, efficient way to control data flow across replicas.
By leveraging Tagged GTID, you gain more control, better consistency, and simplified management in your replication strategies, making it easier to maintain data integrity across complex MySQL environments.
Dive deeper into the technical aspects and real-world applications of Tagged GTID in MySQL 8.4 by listening to our expert-led podcast.
Implement Tagged GTID in MySQL with the help of our MySQL Managed Services, Consulting, and Remote DBA expertise. We specialize in complex replication setups, including InnoDB Cluster consulting and support. Let us help you streamline your replication process and ensure data consistency across all nodes.
Get in touch today to explore how we can optimize your MySQL environment with Tagged GTID and other advanced features.