In today's data-driven world, ensuring data consistency across different databases is paramount. It plays a tedious role in data migration. The TiDB Sync-Diff Inspector tool provides an efficient way to compare and synchronize data between TiDB clusters and other databases like MySQL. This blog post will explore the features, benefits, sample commands, test cases, limitations, and best practices for using the Sync-Diff Inspector tool.
What is the Sync-Diff Inspector?
The Sync-Diff Inspector is a powerful tool developed for TiDB that helps users verify data consistency between any two databases. For example, it can compare the data in MySQL with that in TiDB, the data in MySQL with that in MySQL, or the data in TiDB with that in TiDB. Whether you are migrating data or synchronizing data between environments, this tool can help you identify discrepancies, ensuring that your data is accurate and reliable.
Key Features of Sync-Diff Inspector
- Comprehensive Comparison: The tool compares data at both the schema and data levels, helping you identify not just structural differences but also variations in the data itself.
- Flexible Target Databases: Sync-Diff Inspector can compare TiDB with other databases, including MySQL, PostgreSQL, and more, making it versatile for various environments.
- Detailed Reporting: After performing a comparison, the tool generates detailed reports highlighting differences, making it easier to address issues promptly.
Getting Started with Sync-Diff Inspector
Here are some sample commands to help you get started with the Sync-Diff Inspector:You can install the Sync-Diff Inspector tool from TiDB toolkits.
I have 2 TiDB playground servers , The data is inconsistent in `sample`.`sample1` table.Let’s test it with Sync-Diff Inspector tool
TiDB server 1
select count(*) from sample.sample1;
+----------+
| count(*) |
+----------+
| 8 |
+----------+
TiDB server 2
select count(*) from sample.sample1;
+----------+
| count(*) |
+----------+
| 2 |
+----------+
Configuration Essentials
Below is the sample config.toml file
# Diff Configuration.
######################### Global config #########################
# The number of goroutines created to check data. The number of connections between sync-diff-inspector and upstream/downstream databases is slightly greater than this value.
check-thread-count = 4
# If enabled, SQL statements are exported to fix inconsistent tables.
export-fix-sql = true
# Only compares the table structure instead of the data.
check-struct-only = false
# If enabled, sync-diff-inspector skips checking tables that do not exist in the upstream or downstream.
skip-non-existing-table = false
######################### Databases config #########################
[data-sources]
[data-sources.TiDB1]
host = "127.0.0.1"
port = 4000
user = "test"
password = "****" # The password for connecting to the upstream database. It can be plain text or Base64-encoded.
[data-sources.TiDB2]
host = "10.23.34.56"
port = 4000
user = "test"
password = "****" # The password for connecting to the downstream database. It can be plain text or Base64-encoded.
######################### task config #########################
# Configures the tables of the target database that need to be compared.
[task]
output-dir = "/home/sync_diff_inspector/"
# The upstream database. The value is the unique ID declared by data-sources.
source-instances = ["TiDB1"]
# The downstream database. The value is the unique ID declared by data-sources.
target-instance = "TiDB2"
# The tables of downstream databases to be compared. Each table needs to contain the schema name and the table name, separated by '.'
target-check-tables = ["sample.sample1"]
Configuration file description
The configuration of sync-diff-inspector consists of the following parts:
- Global config: General configurations, such as number of threads to check, whether to export SQL statements to fix inconsistent tables, whether to compare the data, and whether to skip checking tables that do not exist in the upstream or downstream.
- Databases config: Configures the instances of the upstream and downstream databases.
- Routes: Rules for upstream multiple schema names to match downstream single schema names (optional).
- Task config: Configures the tables for checking. If some tables have a certain mapping relationship between the upstream and downstream databases or have some special requirements, you must configure these tables.
- Table config: Special configurations for specific tables, such as specified ranges and columns to be ignored (optional).
Let’s run the sync_diff_inspector tool to identify the data inconsistency in sample.sample1 table.
./sync_diff_inspector --config=config.toml
A total of 1 tables need to be compared
Comparing the table structure of ``sample`.`sample1`` ... equivalent
Comparing the table data of ``sample`.`sample1`` ... failure
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
The data of `sample`.`sample1` is not equal
The rest of the tables are all equal.
A total of 1 table has been compared, 0 tables finished, 1 tables failed, 0 tables skipped.
The patch file has been generated in
'/home/sync_diff_inspector/fix-on-TiDB2/'
You can view the comparision details through '/home/sync_diff_inspector/sync_diff.log'
As we have enabled export-fix-sql = true, SQL statements are exported to fix inconsistent tables under /home/sync_diff_inspector/fix-on-TiDB2/ directory
test@playground:/home/sync_diff_inspector# cat fix-on-TiDB2/sample\:sample1\:0\:0-0\:0.sql
-- table: sample.sample
-- range in sequence: Full
REPLACE INTO `sample`.`sample1`(`id`) VALUES (3);
REPLACE INTO `sample`.`sample1`(`id`) VALUES (4);
REPLACE INTO `sample`.`sample1`(`id`) VALUES (5);
REPLACE INTO `sample`.`sample1`(`id`) VALUES (6);
REPLACE INTO `sample`.`sample1`(`id`) VALUES (7);
REPLACE INTO `sample`.`sample1`(`id`) VALUES (8);
summary.txt file under output directory will give the summary report of the sync_diff_inspector run
test@playground:/home/sync_diff_inspector# cat summary.txt
Summary
Source Database
host = "127.0.0.1"
port = 4000
user = "test"
Target Databases
host = "10.23.34.56"
port = 4000
user = "test"
Comparison Result
The table structure and data in following tables are equivalent
The following tables contains inconsistent data
+-------------------+---------+--------------------+----------------+---------+-----------+
| TABLE | RESULT | STRUCTURE EQUALITY | DATA DIFF ROWS | UPCOUNT | DOWNCOUNT |
+-------------------+---------+--------------------+----------------+---------+-----------+
| `sample`.`sample1` | succeed | true | +6/-0 | 8 | 2 |
+-------------------+---------+--------------------+----------------+---------+-----------+
Time Cost: 6.034878ms
Average Speed: 2.589116MB/s
Benefits of Using Sync-Diff Inspector
- Data Integrity: Regular use of this tool helps ensure that data integrity is maintained, especially during migrations or synchronizations.
- Time-Saving: Automating the comparison process saves significant time compared to manual checks, allowing teams to focus on more critical tasks.
- Error Reduction: By identifying discrepancies early, the tool helps reduce the risk of errors that could lead to data corruption or loss.
Limitations to Consider
While the Sync-Diff Inspector is a powerful tool, it has some limitations and restrictions that users should be aware of:
- Performance: For very large datasets, the comparison process can be time-consuming and resource-intensive. Users should consider performance implications, especially in production environments.
- Data Types Support: Not all data types may be fully supported for comparison. It's essential to test the tool with the specific data types you are using in your databases.
- Complex Queries: The tool may not handle complex queries or views effectively. It's designed primarily for direct table comparisons.
- Network Requirements: A stable and fast network connection between the source and target databases is crucial for efficient operation. Network latency can affect performance.
- Schema Changes: Significant schema changes during the comparison process might lead to incomplete or incorrect results. It’s advisable to stabilize the schema before running comparisons.
Best Practices for Optimal Use
To maximize the effectiveness of the Sync-Diff Inspector, consider the following best practices:
- Regular Audits: Schedule regular data comparisons to catch discrepancies early, especially in environments with frequent data changes.
- Test in Staging: Always test the Sync-Diff Inspector in a staging environment before using it in production. This helps identify potential issues without affecting live data.
- Review Reports Carefully: After running comparisons, carefully review the generated reports. Look for patterns in discrepancies to understand underlying issues better.
- Use Proper Indexing: Ensure that both source and target databases have appropriate indexes in place to improve comparison performance.
- Monitor Performance: Keep an eye on system performance during comparisons. If resource usage spikes, consider running comparisons during off-peak hours.
- Backup Data: Always backup your data before performing major migrations or comparisons to prevent data loss in case of unforeseen issues.
The TiDB Sync-Diff Inspector tool is an essential asset for anyone working with TiDB and other databases. Its robust features, combined with ease of use, make it a valuable tool for ensuring data consistency and integrity. Whether you're migrating data, conducting audits, or managing multiple database environments, the Sync-Diff Inspector can help streamline your processes and mitigate risks.
Wondering how to implement TiDB Sync-Diff Inspector effectively for your environment? At Mydbops, we specialize in helping businesses overcome data inconsistencies with tools like Sync-Diff Inspector and more. Whether you need assistance in setting up, configuring, or interpreting results, our TiDB Consulting Services can help you achieve data consistency and system reliability.