Evolution of Histograms in MySQL 8.4: From Manual to Automated Performance Optimization
Data is powerful but complex. To unlock insights, we need tools that optimize how data is interpreted and queried, especially in databases. MySQL histograms are an essential feature, helping the optimizer understand the distribution of values in a column, thus making query plans more efficient.
In this blog, we’ll look at how histograms have evolved in MySQL, particularly highlighting the advancements in MySQL 8.4, which greatly reduce manual work and enhance performance optimization.
Histograms in MySQL
Histograms in MySQL provide statistical representations of data distribution in a specific column, allowing the optimizer to make informed decisions about which indexes to use. This is especially useful for estimating the cost of queries and optimizing large datasets.
In earlier MySQL versions, creating and updating histograms was a manual process. Database administrators (DBAs) had to reanalyze tables periodically to ensure that histograms accurately reflected data distribution. Read our blog on Histogram in MySQL 8.0. MySQL 8.4, however, has transformed histogram management, making it more automated and adaptive.
Evolution of Histograms in MySQL
When discussing the evolution of histograms in MySQL, it's essential to look at the changes before and after MySQL 8.4. The new features introduced in MySQL 8.4 significantly simplified histogram management, reducing manual effort for users.
To understand how histograms behaved prior to MySQL 8.4, let's go through an example.
Step 1: Create the Table and Insert Initial Data
CREATE TABLE sales (
id INT PRIMARY KEY AUTO_INCREMENT,
product_id INT,
sale_date DATE,
quantity INT
);
Query OK, 0 rows affected (0.04 sec)
INSERT INTO sales (product_id, sale_date, quantity)
VALUES (101, '2024-10-01', 20), (101, '2024-10-02', 15),
(5000, '2024-10-01', 5), (5000, '2024-10-02', 7);
Query OK, 4 rows affected (0.01 sec)
Records: 4 Duplicates: 0 Warnings: 0
Step 2: Create a Histogram on product_id
This command creates a histogram on the product_id
column, dividing it into 100 buckets. This histogram enables MySQL to estimate the distribution of product_id
values across the table.
Note: When you set WITH 100 BUCKETS
, MySQL attempts to capture the data distribution in 100 segments. Each bucket represents a range of product_id
values, with the bucket width adjusted so each contains a similar number of entries. A histogram with more buckets generally provides a more accurate data distribution, though it requires additional memory and processing power.
mysql> ANALYZE TABLE sales UPDATE HISTOGRAM ON product_id WITH 100 BUCKETS;
+--------------+-----------+----------+-------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------------+-----------+----------+-------------------------------------------------------+
| sbtest.sales | histogram | status | Histogram statistics created for column 'product_id'. |
+--------------+-----------+----------+-------------------------------------------------------+
1 row in set (0.01 sec)
Step 3: Query with Histogram-Based Estimation
With the histogram in place, let’s see how MySQL estimates the rows for a query targeting product_id = 101
.
mysql> EXPLAIN SELECT * FROM sales WHERE product_id = 101\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sales
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4 -- Accurate estimate based on histogram
filtered: 50.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
Thanks to the histogram data, the optimizer now provides a more accurate estimate, showing 4 rows for product_id = 101
.
Step 4: Insert Additional Rows to Simulate a Change in Distribution
Next, we insert additional rows for product_id = 101
to simulate a change in the data distribution.
mysql> INSERT INTO sales (product_id, sale_date, quantity)
-> VALUES (101, '2024-10-03', 30), (101, '2024-10-04', 25);
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
Step 5: Check Row Estimates Without Manually Updating the Histogram
Now, let's check how MySQL estimates rows for product_id = 101
after adding the new data, without manually updating the histogram.
mysql> EXPLAIN SELECT * FROM sales WHERE product_id = 101\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sales
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 6 -- Estimate updated based on increased row count
filtered: 50.00
Extra: Using where
1 row in set, 1 warning (0.01 sec)
After adding the new rows, MySQL now estimates 6 rows for product_id = 101
, reflecting the impact of the data changes on the row estimates.
Step 6: Manually Update the Histogram in MySQL 8.0
In MySQL 8.0, you need to manually update the histogram to ensure the optimizer accurately recognizes the new data distribution.
mysql> ANALYZE TABLE sales UPDATE HISTOGRAM ON product_id;
+--------------+-----------+----------+-------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------------+-----------+----------+-------------------------------------------------------+
| sbtest.sales | histogram | status | Histogram statistics created for column 'product_id'. |
+--------------+-----------+----------+-------------------------------------------------------+
1 row in set (0.01 sec)
With the updated histogram, MySQL now has an accurate view of the product_id
distribution.
mysql> EXPLAIN SELECT * FROM sales WHERE product_id = 101\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: sales
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 6
filtered: 66.67 -- Filter percentage updated based on new distribution
Extra: Using where
1 row in set, 1 warning (0.00 sec)
After manually updating the histogram, the filtered percentage increased from 50.00 to 66.67, reflecting the optimizer's improved understanding of the new data distribution.
In MySQL 8.4, the need for manual updates, as demonstrated in step 6, has been eliminated. The histogram now automatically updates based on changes in the data distribution due to new insertions, updates, or deletions.
Adaptive Histograms
In MySQL 8.4, Adaptive Histograms were introduced to further optimize database performance, particularly with query execution. This feature builds upon regular histograms by not just updating based on data changes but also adapting dynamically to query patterns, allowing MySQL to adjust how it stores and analyzes data for faster execution.
Here's how Adaptive Histograms work:
- Adaptive Histograms monitor the types of queries being executed on the database and adjust the histogram structure accordingly.
- The database creates or adjusts buckets based on frequent query patterns. So if certain values are queried often, MySQL will create additional buckets to make those queries faster.
- This allows MySQL to better estimate cardinality and make smarter decisions when selecting data for a query. It also helps the optimizer understand which data is accessed more frequently, resulting in more accurate query planning.
Key Features:
- Query Awareness: Adaptive histograms track query patterns and adjust data distribution to improve performance over time.
- Automatic Adjustment: It updates itself based on the usage patterns without requiring manual intervention.
- New Buckets for Frequent Queries: Adaptive histograms can create new buckets specifically for values that are queried frequently, speeding up access to that data.
Example:
Imagine a scenario where you have a database table with sales data, and one of the columns is "Product_ID." Normally, the histogram would divide the product IDs into a set number of buckets. But with adaptive histograms, if certain products (like "Product_ID = 101") are queried much more frequently, MySQL will create more specific buckets for that product ID, ensuring faster retrieval for future queries.
Summary of Key Improvements in MySQL 8.4
With the release of MySQL 8.4, significant advancements have been made in the implementation and functionality of histograms, enhancing their utility for database performance optimization. These improvements address the limitations of previous versions, offering a more automated and adaptive approach to histogram management.
The following table summarizes the key improvements in histogram features from MySQL 8.0 to MySQL 8.4:
The introduction of automatic histogram updates in MySQL 8.4 significantly enhances performance optimization. Database administrators no longer need to manually maintain histogram statistics, leading to more efficient query planning and execution without additional overhead. This automation simplifies data management and optimizes database performance, especially in dynamic environments where data is frequently updated.
At Mydbops, we offer expert MySQL Consulting and Managed Services to help you leverage the full potential of MySQL 8.4 and beyond. Whether you need guidance with upgrading, performance tuning, or ongoing database management, our team of experienced professionals is here to support your business every step of the way.
Contact us today to learn how we can optimize your MySQL environment and ensure peak performance for your database infrastructure.