TiDB Distributed eXecution Framework DXF

Mydbops
Apr 1, 2025
5
Mins to Read
All
TiDB Distributed eXecution Framework Illustration

TiDB Distributed eXecution Framework (DXF): Revolutionizing Database Performance

We've all experienced this. It’s late at night, and you find yourself waiting for a database operation that seems like it’s taking forever – maybe you’re trying to create an index and it just never seems to finish, or perhaps you’re trying to import data and it’s going on and on and on. Minutes become hours, and you start wondering whether you are ever going to get to log off.

Now picture this scenario: instead of just one machine working all of the time, your database operations are distributed across multiple nodes. Data is processed more quickly, and you don’t have to just sit around and wait for data operations to finish. This is the beauty of TiDB’s Distributed eXecution Framework (DXF).In this blog, we will cover how DXF is a game changer in making database operations easier and more seamless. Let's dive in!

Comparison of food delivery app and TiDB DXF assigning tasks to nearby delivery partners or best-suited nodes.

What is DXF?

TiDB follows a computing-storage separation model, allowing for better scalability and flexibility. With the release of version 7.1.0, TiDB introduced the Distributed eXecution Framework (DXF)—a major step toward improving distributed database management. DXF brings unified task scheduling, distributed execution, and centralized resource management, ensuring efficient system performance while maximizing resource utilization.

Flowchart of TiDB DXF operational sequence: traditional database, task distribution, parallel processing, and task completion.

DXF is TiDB's answer to distributed task management. Rather than overwhelming one node with intensive tasks, DXF efficiently distributes operations between multiple nodes in the cluster. Imagine delegating a group of experienced workers to work on a project as a team — quicker, smoother, and without exhausting one resource.With DXF, operations such as ADD INDEX, IMPORT INTO, ANALYZE, and TTL management become much more streamlined.

Diagram showing TiDB DXF with centralized resource management, unified task scheduling, and distributed execution.

Use Cases of DXF

DXF is designed for executing large-scale database management tasks beyond core Transactional Processing (TP) and Analytical Processing (AP), such as:

  • DDL operations (e.g., ADD INDEX)
  • IMPORT INTO (importing CSV, SQL, Parquet files)
  • TTL (Time-to-Live) management
  • ANALYZE for statistics collection
  • Backup and Restore

Key Benefits of DXF

  • Most impressive about DXF is its superior scalability while not compromising on availability and performance levels. It loads work intelligently across your TiDB cluster, making the best use of available computing resources when and where it's necessary.
  • DXF also maintains resource balance optimally, keeping both system as a whole healthy along with individual tasks completed.
TiDB DXF efficiency diagram showing five key benefits

Limitations of DXF

There are certain limitations to keep in mind, however - DXF limits concurrent operation to 16 concurrent tasks, including operations such as adding indexes or importing data(ADD INDEX and IMPORT INTO).

Enabling and Configuring DXF

Before using DXF for ADD INDEX, Fast Online DDL mode must be enabled:

System Variables:

How to Enable DXF

DXF is enabled by default from TiDB v8.1.0. For earlier versions, it can be enabled manually using the following command:

SET GLOBAL tidb_enable_dist_task = ON;

When enabled, supported statements (e.g., ADD INDEX, IMPORT INTO) execute in a distributed manner across all TiDB nodes.

Recommended System Variables for DXF

Task Scheduling and Management

Tasks are distributed across all TiDB nodes by default. From versions 7.4.0 to 8.0.0, the tidb_service_scope can be configured to control task execution:

Controlling Task Execution (v7.4.0+)

  • v7.4.0 - v8.0.0:
    • tidb_service_scope can be set to '' (default) or background.
    • If any nodes have tidb_service_scope = 'background', DXF prioritizes scheduling tasks on them.
    • If no such nodes exist, tasks run on nodes with default service scope
  • v8.1.0+:
    • tidb_service_scope can be assigned any valid value.
    • Tasks are scheduled only on nodes with the same tidb_service_scope as the submitting node.
    • Newly added nodes automatically follow these rules.

Best Practices

  • For clusters running v7.4.0 to v8.0.0, set tidb_service_scope = 'background' on at least two TiDB nodes.
  • Ensure sufficient resources are available by monitoring the system’s resource utilization.
  • Review and adjust the number of concurrent tasks for optimized performance.

DXF Architecture

DXF Workflow Diagram

TiDB DXF Architecture

As shown in the preceding diagram, the execution of tasks in the DXF is mainly handled by the following modules:

  • Dispatcher: generates the distributed execution plan for each task, manages the execution process, converts the task status, and collects and feeds back the runtime task information.
  • Scheduler: replicates the execution of distributed tasks among TiDB nodes to improve the efficiency of task execution.
  • Subtask Executor: the actual executor of distributed subtasks. In addition, the Subtask Executor returns the execution status of subtasks to the Scheduler, and the Scheduler updates the execution status of subtasks in a unified manner.
  • Resource pool: provides the basis for quantifying resource usage and management by pooling computing resources of the above modules.

Benchmark report :

TiDB DXF significantly enhances resource utilization, task scheduling efficiency, and execution performance for distributed workloads. It is a robust framework tailored for large-scale data operations in a TiDB cluster, ensuring optimal performance and scalability.

interface showing a bar chart of index creation time for a 40TiB table, comparing different DXF operations

Environment Query Time (Hours) Time (Minutes) Time (Seconds)
DXF ALTER TABLE item ADD INDEX idx('product_id'); 0 59 46
DXF ALTER TABLE item ADD INDEX idx_pid('product_id'), ADD INDEX idx_mid('merchant_id'), ADD INDEX idx_ct('created_time'), ADD INDEX idx_miid_misid('merchant_item_id', 'merchant_item_set_id'); 1 27 09
DXF ALTER TABLE item ADD UNIQUE INDEX idx(merchant_id, item_primary_key); 1 6 9.11
Local ALTER TABLE item ADD INDEX idxf('product_id'); 14 38 09

Summary

Managing large-scale databases doesn’t have to be a constant headache. With TiDB’s DXF, you get the power of distributed task management, optimized resource utilization, and faster operations without sacrificing sleep. Whether it’s handling massive data imports, creating indexes, or running analytics, DXF ensures everything runs smoothly.

So, would you prefer the unchanged challenges of traditional database management? Or wouldy ou find a future with DXF and the impact of distributed databases freeing your time and resources once and for all.

Looking to scale your databases without the headaches? Mydbops offers specialized TiDB Consulting Services to help you implement DXF for peak performance. Our team will guide you from setup to fine-tuning, ensuring your database works smarter, not harder.

Let’s make your database challenges a thing of the past. Reach out to Mydbops today and discover how DXF can bring unparalleled scalability to your business.

No items found.

About the Author

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.