TiDB's Raft-based Replication

Mydbops
May 20, 2024
8
Mins to Read
All

In today's data-driven world, managing large-scale databases requires robust solutions. TiDB, a popular open-source distributed SQL database, stands out for its horizontal scalability, consistency, and reliability. At the heart of TiDB's distributed architecture lies its Raft-based replication mechanism. This blog post delves into the fundamentals of the Raft consensus algorithm and explores how TiDB utilizes it for data replication, ensuring data consistency and fault tolerance for your critical applications.

Understanding the Raft consensus algorithm and its benefits

Raft is a consensus algorithm designed as an alternative to the more complex Paxos. It provides a more understandable approach for managing  replicated logs across distributed systems. Its primary goal is to ensure that the replicated log's entries are consistent across all nodes, thereby maintaining the integrity and availability of data.

Raft in Action with TiKV

TiKV uses Raft to synchronize data across multiple replicas (typically three) for fault tolerance. Each piece of data in TiKV is part of a Region, and each Region has its own Raft group.

Here’s how Raft operates within TiKV:

  • Leader Election: Each Raft group elects a leader among its peers. The leader handles all write requests for that group, ensuring that data remains consistent by replicating entries to its followers.
  • Log Replication: Once the leader is elected, it begins processing client requests. Each data modification is recorded as a new entry in the Raft log. This log entry must be replicated to a majority of followers before it is applied to the state machine (i.e., it becomes an actual data modification in the database).
  • Safety and Consistency: Raft ensures safety through its consensus mechanism. If a leader fails, a new one is elected. Raft guarantees that the replicated log will be identical on all nodes, preventing data inconsistencies even in the event of network failures or node crashes.
  • Committing Entries: An entry is considered committed when a majority of the Raft group has written it to their logs. It can then be applied to the actual database state in TiKV.

Benefits of Raft-based Replication

Using Raft, TiKV ensures that:

  • Data is Highly Available: Even if a minority of nodes fail, the system continues to operate without data loss.
  • Data is Consistent: All nodes in a Raft group contain the same data at any point in time.
  • Failover : Raft’s leader election and log replication ensure that new leaders can take over seamlessly without data loss.

Multi-Raft with TiDB

  • TiDB data is divided into many Regions(96 MB) according to Key, The TiDB system has a PD component that is responsible for spreading Regions as evenly as possible across all TiKV nodes in the cluster. In this way the storage capacity is scaled horizontally .
  • Each region operates as an independent raft group. Multiple of these raft groups co-exists on nodes forming multi-raft design. The region placement , balancing and metadata of these regions are maintained by the PD (Placement Driver)
Multi-raft Design with TiDB
Multi-raft Design with TiDB

The Raft log replication process step by step

1. Propose: When a client initiates a write request, it sends the request to the Raft leader. This process is termed as Propose in Raft. The leader receives the write request and prepares to replicate it to the followers.

The Raft log replication process
The Raft log replication process: Propose

2. Append: The leader encodes the write operation into a log entry and appends it to its own local log. This action is termed as Append. The log entry typically contains information about the operation to be performed, such as key-value pairs for a database write.

The Raft log replication process
The Raft log replication process: Append

3. Replicate: After appending the log entry to its own log, the leader replicates the log entry to the followers. This replication ensures that all followers receive the same sequence of log entries as the leader. Each follower then appends the received log entry to its own log.

The Raft log replication process
The Raft log replication process: Replicate

4. Append (follower): Upon receiving the replicated log entry from the leader, each follower appends the entry to its own log. This ensures consistency across all nodes in the Raft cluster.

5. Commit: Once the leader determines that a majority of the followers have replicated the log entry (typically, once a majority of nodes have acknowledged receipt), it considers the log entry as committed. This means that the write operation represented by the log entry is guaranteed to be durable and will not be lost even in the event of a failure.

The Raft log replication process
The Raft log replication process: Commit

6. Apply: After a log entry is committed, the leader applies the corresponding operation to its state machine. In the context of a distributed database like TiKV, this typically involves executing the write operation on the local data store. Once applied, the changes become visible and durable.

TiDB's Raft-based replication mechanism ensures data consistency, fault tolerance, and high availability in distributed environments. By leveraging the principles of the Raft consensus algorithm, TiDB provides a robust foundation for scalable and reliable SQL databases.

Ready to experience the power and stability of TiDB with confidence?

We offer a variety of TiDB consulting and Remote DBA services to fit your specific requirements. Contact Mydbops today for a free consultation! Let's discuss your TiDB needs and how we can help you leverage Raft replication for a truly scalable and reliable database solution.

{{cta}}

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.