TiDB Lightning's High-Speed Data Import

TiDB Lightning for High-Speed Data Import

Nov 30, 2024

Mins to Read

All

TiDB Lightning is a high-speed data import tool designed specifically for TiDB. It enables users to efficiently load large volumes of data into a TiDB cluster. By leveraging TiDB Lightning, users can seamlessly import data from SQL dumps generated by Dumpling, ensuring a smooth transition of structured data into the TiDB environment. This powerful tool allows organizations to migrate their data effortlessly, preparing their databases to handle workloads effectively. In this blog, we’ll explore TiDB Lightning's key features, backend options, and a step-by-step guide to using it effectively. By the end, you'll know how to optimize data imports and enhance database management.

Key Features and Components

High-Speed Data Import

‍TiDB Lightning excels in importing data at remarkable speeds. By leveraging multiple threads for data loading and optimizing the process with bulk writes, it significantly reduces the time required to import large datasets compared to traditional methods.

Compatibility with MySQL

‍TiDB Lightning seamlessly supports data imports from MySQL, making it an ideal solution for organizations migrating or integrating existing MySQL databases into TiDB.

Flexible Data Formats

‍Beyond MySQL, TiDB Lightning also supports CSV files, offering flexibility to import data from various sources. This capability simplifies centralizing data management within the TiDB ecosystem.

Explore TiDB Lightning's architecture, backend options, and practical use cases in our latest podcast.

Architecture of TiDB Lightning

TiDB Lightning is a powerful utility divided into two key components:

tidb-lightning (commonly referred to as “Lightning”)
tikv-importer (referred to as “Importer”)

The Lightning component reads data from the specified source and focuses on transforming SQL statements into key-value pairs. This transformation is crucial for integrating structured data into the TiDB ecosystem. Once the data is prepared, TiDB Lightning hands it off to the tikv-importer component. The Importer then organizes these KV pairs and schedules them for ingestion into the TiKV servers.

TiDB Lightning Backend Options‍

TiDB Lightning offers three backend configurations, each with a unique operational approach:

Importer Backend‍

In this configuration, TiDB Lightning encodes all SQL or CSV data into key-value (KV) pairs. The tikv-importer component then sorts these KV pairs and uploads them to TiKV nodes as SST files. Finally, TiKV ingests these SST files into the cluster.

Local Backend‍

Similar to the Importer Backend, TiDB Lightning encodes data into key-value pairs, but in this setup, the pairs are first sorted and stored in a local temporary directory. These pairs are then uploaded to TiKV nodes as SST files. Use the local-backend when dumping data into empty tables.

TiDB Backend‍

In this setup, TiDB Lightning encodes the data into SQL INSERT statements and directly executes these statements on the TiDB node. This method is suitable when dumping data into tables that already contain some data.

Comparison of Backend Options

Backend	Importer-backend	Local-backend	TiDB-backend
Speed	Fast (~300 GB/hr)	Fast (~500 GB/hr)	Slow (~50 GB/hr)
Resource usage	High	High (but less than importer-backend)	Low
Network bandwidth usage	Medium	High	Low
ACID compliance while importing	No	No	Yes
Target tables	Must be empty	Must be empty	Can be populated
Additional component required	tikv-importer	No	No
TiDB versions supported	All	>= v4.0.0	All

Step-by-Step Guide to Using TiDB Lightning

Create a user to import the data using lightning tool. The user should have the following privileges.

CREATE USER 'mydbops_lightning_user'@'10.%' IDENTIFIED BY '******';
GRANT CREATE, SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER on *.* to 'mydbops_lightning_user'@'10.%';

Create a tidb configuration file. Below is the sample config file. We have already taken the backup of the table ‘test.orders’ using dumpling and it is maintained under /var/tmp/mysql/dumpling directory.

[lightning]
# logging
level = "info"
file = "tidb-lightning.log"
 
[tikv-importer]
# Uses the Local-backend
backend = "tiDB"
# Sets the directory for temporarily storing the sorted key-value pairs.
# The target directory must be empty.
sorted-kv-dir = "/tmp/sorted-kv-dir/"
 
[mydumper]
# Local source data directory
data-source-dir = "/var/tmp/mysql/dumpling"
 
# Configures the wildcard rule. By default, all tables in the mysql, sys, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA, METRICS_SCHEMA, and INSPECTION_SCHEMA system databases are filtered.
# If this item is not configured, the "cannot find schema" error occurs when system tables are imported.
filter = [ '*.*','!mysql.*', '!sys.*', '!INFORMATION_SCHEMA.*', '!PERFORMANCE_SCHEMA.*','!METRICS_SCHEMA.*', '!INSPECTION_SCHEMA.*']
no-schema = true
 
 
[tidb]
# Information of the target cluster
host = "10.24.32.125"
port = 4000
user = "mydbops_lightning_user"
password = "****"
# Table schema information is fetched from TiDB via this status-port.
status-port = 10080
# The PD address of the cluster
pd-addr = "10.24.32.125:2385"
log-level = "info"
 
[[routes]]
 # Table routes
schema-pattern = "test"
table-pattern = "orders"
target-schema = "test"
target-table = "orders"

Run the tidb-lightning command

./tidb-lightning -config tidb_orders_config.toml

In today’s data-driven world, having the right tools to manage and migrate data is more important than ever. TiDB Lightning excels as a fast, flexible, and reliable solution for importing large datasets, whether you're migrating from MySQL, working with CSV files, or simply enhancing your database management processes. By leveraging TiDB Lightning, you can streamline your data migration, boost efficiency, and ensure that your TiDB environment is ready to handle the most demanding workloads.

If you’re looking to optimize your TiDB environment, streamline your data migration, or ensure the best performance for your TiDB cluster, Mydbops offers expert TiDB Consulting and Remote DBA Services. Whether you're implementing TiDB Lightning or managing your TiDB databases, our team of experienced professionals is here to help.

Contact Mydbops today to learn how our services can help you unlock the full potential of TiDB and manage your data more efficiently!

No items found.