Modern applications generate a tidal wave of data. But what happens to all that information as it accumulates? Just like a cluttered attic, a database overflowing with old data can lead to sluggish performance and wasted storage costs.
As data grows, it becomes increasingly important to manage the lifetime of data to maintain system performance and reduce costs. TiDB, a distributed SQL database, offers a powerful feature called TTL (Time to Live) to automate the data lifetime at row level. In this blog post, we'll explore how you can leverage TiDB's TTL feature to efficiently manage your data lifecycle.
Understanding TTL in TiDB
Imagine your primary database as a bustling library. New information arrives daily, adding to the ever-growing collection. But what happens to outdated books or irrelevant data? Manually sorting and discarding them can be a time-consuming nightmare. This is where TiDB's Time-to-Live (TTL) feature comes in as your digital librarian.
The TTL (Time To Live ) feature in TiDB allows you to specify a retention period for your data. Just like setting an expiration date on a library book, TTL automatically identifies and deletes rows that exceed their designated timeframe. This helps in keeping your primary database lean and performant.
For an even deeper dive into optimizing TiDB performance, explore our blog post on TiDB's Smart Engine Selection!
Setting Up TTL
TTL works per table level. You can define the TTL attribute while creating the table or you can also use ALTER TABLE statement to modify the tables.
Syntax
Create Table:
Alter Table:
With Alter, you can modify or enable the TTL attribute at table level as below.
To understand TiDB's write path in more detail, check out our blog post on Demystifying TiDB's Write Path!
TTL and DataTypes
TTL works with the TIMESTAMP data type along with the Default value of the data type such as DEFAULT CURRENT_TIMESTAMP and DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
TTL and Job Management
TiDB schedules a background job for each TTL attribute defined at the table level. By default, this job runs every hour to clean up expired data. The job is split into 64 subtasks, with each subtask operating on a region as the smallest unit, ensuring distributed execution. You can manage the execution of these jobs at both the table and global levels.
Enable and Disable
Table level
Global Level
You can disable/enable all the TTL jobs at cluster level as below.
TTL schedule
By default, the TTL job for tables runs every one hour. You can change this behavior per table as follows:
In certain scenarios, you may require these TTL jobs to be executed within a specific window of the day to reduce resource consumption during peak traffic times. You can achieve this by setting these values at the cluster level.
With the above configuration, you allow the TTL jobs to be executed only between 1:00 to 5:00 UTC.
Learn more about TiDB's powerful scheduling capabilities in our blog post on TiDB Scheduling!
Observability with TiDB TTL
TiDB collects runtime and historical information about the TTL jobs periodically with 3 tables under MySQL schema.
- tidb_ttl_table_status : This table contains information about the previously executed TTL job and ongoing TTL jobs for the table.
- tidb_ttl_task : This table contains info about the ongoing TTL subtask execution.
- tidb_ttl_job_history : Contains info about the TTL jobs that has been executed, this table keeps the last 90 days status on jobs.
TiDB also provides many TTL graphs under the Grafan TiDB dashboard:
Here is a snippet of a graph that shows the historical trend of insert rates per day and deletes in TTL tables. If the rate of inserts increases by multifold, you might need to adjust the TTL variables to increase the rate of deletes.
Use Cases for TTL in TiDB
- Session Data Management: Using TTL for applications that store session data in the database can automatically clean up old sessions, reducing the storage footprint and improving query performance.
- Event Logging: In systems that generate a large number of event logs, TTL can help in automatically purge old logs, ensuring that the database only contains relevant, recent data.
- Time-Series Data: For time-series databases that store metrics or IoT data, TTL can be used to retain only the most recent data, which is often the most relevant for analysis.
Benefits of Using TTL in TiDB
- Reduced Storage Costs: By automatically archiving or deleting old data, you can significantly reduce storage costs, especially in cloud environments.
- Improved Performance: Keeping your database size manageable by removing outdated data can lead to better performance, especially for read-heavy workloads.
- Simplified Data Management: TTL automates the data archival process, reducing the need for manual data cleanup and maintenance tasks.
TiDB's TTL automates data lifecycle management, optimizing storage and performance as your data grows. Ready to leverage TTL effectively? Explore Mydbops' TiDB Consulting and Remote DBA services! Our experts can help you configure TTL and optimize your TiDB environment. Contact Mydbops today to learn more!
{{cta}}