TiDB's Native Sharding for Seamless Database Scalability

Mydbops
Oct 6, 2023
10
Mins to Read
All

TiDB is an open-source ACID-compliant NewSQL database that natively supports automatic sharding. Yes, you heard it right – data stored in TiDB is sharded by default. This feature simplifies database scaling and ensures efficient data distribution. Moreover, TiDB offers horizontal scalability, making it ideal for handling large and growing datasets.

Understanding Sharding

Before we dive into the benefits of TiDB's native sharding, let's understand what sharding is all about.

Sharding is a data management technique where a large dataset is broken down into smaller, more manageable pieces called shards. Each shard is stored independently on its server or node.

TiDB Native Sharding
Sharding Architecture

Benefits of Sharding

Sharding offers a multitude of advantages that can transform the way you manage your data:

  1. Scalability: Sharding allows your database to scale horizontally by adding more servers or nodes. This means you can handle more data and queries as your application grows.
  2. Performance: With sharding, you can significantly improve database read and write performance. Queries can be directed to specific shards, reducing the amount of data that needs to be processed on a single node.
  3. Availability: Sharding enhances database availability by reducing the impact of server failures. If one shard goes down, the others can continue to operate, ensuring your application remains online.

Complexities of Sharding

While the benefits of sharding are enticing, it's essential to acknowledge the complexities it introduces:

  • Choosing the Shard Key: Selecting the right shard key is critical for efficient sharding. It determines how data is distributed across shards.
  • Data Distribution: Ensuring an even distribution of data across shards can be challenging, especially as your dataset grows.
  • Query Routing: Properly routing queries to the appropriate shard requires careful planning and implementation.
  • Handling Cross-Shard Transactions: Coordinating transactions that involve data on multiple shards adds complexity to your application logic.
  • Data Migration: Moving data between shards can be tricky, and it's something you'll need to consider as your data needs evolve.
  • Cluster Scaling: Sharded clusters should be able to scale both vertically (adding more resources to existing servers) and horizontally (adding more shards). Managing the scaling process while maintaining performance is challenging.
  • Consistent Backup: Ensuring consistent backups across shards can be tricky. Coordinating backup schedules and methodologies to capture the entire dataset while minimizing impact on production systems is important.

That's a lot to consider, Don't worry we got you covered with Native sharding feature with TiDB which is built ground up with all these things in mind and scalability at its core handling petabytes of data with commodity hardware

TiDB's Native Sharding

With TiDB's Native Sharding, handling vast datasets becomes effortless. Traditional databases often struggle to scale as data grows, but TiDB's innovative approach allows you to seamlessly expand your database's capacity by adding more servers or nodes.

How Sharding Works in TiDB

Here's a quick overview of how sharding is done with TiDB:

  • Choosing the Shard Key: First and foremost is the selection of the 'shard-key,' which determines how data is partitioned into smaller segments and distributed. TiDB automatically generates the shard key, also known as the 'Global-key,' by combining the TableID with ROW_I.
 

GlobalKey = { TableID + RowID }
	
  • With the Globalkey, each row is uniquely identified within the entire cluster.
  • TableID: TiDB generates a tableID and assign it for each table which is an integer to uniquely identify a table within the cluster.
  • RowID: RowID is used to uniquely identify a row within the table (consider as primary key)
  • Its chosen as below:
    • If the table has Integer as primary key its chosen as the RowID
    • For Non-Integer primary key tables (like varchar) an Integer RowID is generated by TiDB itself
  • With the combination of TableID + RowID ie., Global key a row is uniquely identified within the TiDB cluster.

These revisions make the information more concise and easier to understand.The generation of the Global-key is handled at the SQL layer, specifically within TiDB nodes. let me try to explain in details with a relational data.Here is our sample relational data with four columns: ID, Name, Mobile, and DOB, containing six records.

TiDB Native Sharding

With our sample relational data ID column as primary key, this is the format of data (relational) which is ingested from the client application.Now the processing begins at the SQL layer TiDB  to generate the TableID T07 as below.

TiDB Native Sharding

When combined with the RowID(PK), it forms a global key as shown below, highlighted in yellow: T07+1 = T07_1 (global key).

TiDB Native Sharding
Global Key

In the storage layer, data is stored as a key-value pair. The first highlighted part becomes the key (global-key), while the remaining part becomes the values, as shown below.

TiDB Native Sharding

Next step is where the data is split based on the global-key into multiple shards of fixed size(96Mb) called regions and distributed across the storage nodes (TiKV).

TiDB Native Sharding

Sharding Automation in TiDBThe beauty of TiDB's native sharding is its automation. TiDB handles the entire sharding process, ensuring that data is efficiently split based on the Global-key into multiple shards of fixed size (96Mb). These shards are then distributed across the storage nodes (TiKV) seamlessly.TiDB's automatic sharding capability sets it apart as an outstanding open-source ACID compliant NewSQL database. Sharding brings compelling benefits, including seamless scalability, enhanced performance through efficient query routing, and improved availability. While sharding introduces complexities like shard key selection and data distribution, TiDB's Native Sharding feature simplifies these challenges, making it a powerful choice for managing large data, even on regular hardware.So, if you're looking to unlock the power of scalability and performance for your data-intensive applications, TiDB with its native sharding is the way to go. Your data will thank you for it!

Watch our recent webinar on TiDB in a Nutshell

Author's Recent Insights at LSPE Meetup

Unlock the full potential of your TiDB deployment with our expert TiDB services. Contact Mydbops today for performance tuning and optimization. We offer TiDB consulting services tailored to your specific needs.

{{cta}}

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.