Evolution of Histograms in MySQL: From Manual to Automated Performance Optimization

Evolution of Histograms in MySQL 8.4: From Manual to Automated Performance Optimization

Nov 11, 2024

Mins to Read

All

Data is powerful but complex. To unlock insights, we need tools that optimize how data is interpreted and queried, especially in databases. MySQL histograms are an essential feature, helping the optimizer understand the distribution of values in a column, thus making query plans more efficient.

In this blog, we’ll look at how histograms have evolved in MySQL, particularly highlighting the advancements in MySQL 8.4, which greatly reduce manual work and enhance performance optimization.

Histograms in MySQL

Histograms in MySQL provide statistical representations of data distribution in a specific column, allowing the optimizer to make informed decisions about which indexes to use. This is especially useful for estimating the cost of queries and optimizing large datasets.

In earlier MySQL versions, creating and updating histograms was a manual process. Database administrators (DBAs) had to reanalyze tables periodically to ensure that histograms accurately reflected data distribution. Read our blog on Histogram in MySQL 8.0. MySQL 8.4, however, has transformed histogram management, making it more automated and adaptive.

Evolution of Histograms in MySQL

When discussing the evolution of histograms in MySQL, it's essential to look at the changes before and after MySQL 8.4. The new features introduced in MySQL 8.4 significantly simplified histogram management, reducing manual effort for users.

To understand how histograms behaved prior to MySQL 8.4, let's go through an example.

Step 1: Create the Table and Insert Initial Data

CREATE TABLE sales (
  id INT PRIMARY KEY AUTO_INCREMENT,
  product_id INT,
  sale_date DATE,
  quantity INT
);
Query OK, 0 rows affected (0.04 sec)


INSERT INTO sales (product_id, sale_date, quantity)
VALUES (101, '2024-10-01', 20), (101, '2024-10-02', 15),
       (5000, '2024-10-01', 5), (5000, '2024-10-02', 7);
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

Step 2: Create a Histogram on `product_id`

This command creates a histogram on the product_id column, dividing it into 100 buckets. This histogram enables MySQL to estimate the distribution of product_id values across the table.

Note: When you set WITH 100 BUCKETS, MySQL attempts to capture the data distribution in 100 segments. Each bucket represents a range of product_id values, with the bucket width adjusted so each contains a similar number of entries. A histogram with more buckets generally provides a more accurate data distribution, though it requires additional memory and processing power.

mysql> ANALYZE TABLE sales UPDATE HISTOGRAM ON product_id WITH 100 BUCKETS;
+--------------+-----------+----------+-------------------------------------------------------+
| Table        | Op        | Msg_type | Msg_text                                              |
+--------------+-----------+----------+-------------------------------------------------------+
| sbtest.sales | histogram | status   | Histogram statistics created for column 'product_id'. |
+--------------+-----------+----------+-------------------------------------------------------+
1 row in set (0.01 sec)

Step 3: Query with Histogram-Based Estimation

With the histogram in place, let’s see how MySQL estimates the rows for a query targeting product_id = 101.

mysql> EXPLAIN SELECT * FROM sales WHERE product_id = 101\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sales
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4 -- Accurate estimate based on histogram
     filtered: 50.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Thanks to the histogram data, the optimizer now provides a more accurate estimate, showing 4 rows for product_id = 101.

Step 4: Insert Additional Rows to Simulate a Change in Distribution

Next, we insert additional rows for product_id = 101 to simulate a change in the data distribution.

mysql> INSERT INTO sales (product_id, sale_date, quantity)
    -> VALUES (101, '2024-10-03', 30), (101, '2024-10-04', 25);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

Step 5: Check Row Estimates Without Manually Updating the Histogram

Now, let's check how MySQL estimates rows for product_id = 101 after adding the new data, without manually updating the histogram.

mysql> EXPLAIN SELECT * FROM sales WHERE product_id = 101\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sales
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 6 -- Estimate updated based on increased row count
     filtered: 50.00
        Extra: Using where
1 row in set, 1 warning (0.01 sec)

After adding the new rows, MySQL now estimates 6 rows for product_id = 101, reflecting the impact of the data changes on the row estimates.

Step 6: Manually Update the Histogram in MySQL 8.0

In MySQL 8.0, you need to manually update the histogram to ensure the optimizer accurately recognizes the new data distribution.

mysql> ANALYZE TABLE sales UPDATE HISTOGRAM ON product_id;
+--------------+-----------+----------+-------------------------------------------------------+
| Table        | Op        | Msg_type | Msg_text                                              |
+--------------+-----------+----------+-------------------------------------------------------+
| sbtest.sales | histogram | status   | Histogram statistics created for column 'product_id'. |
+--------------+-----------+----------+-------------------------------------------------------+
1 row in set (0.01 sec)

With the updated histogram, MySQL now has an accurate view of the product_id distribution.

mysql> EXPLAIN SELECT * FROM sales WHERE product_id = 101\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sales
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 6
     filtered: 66.67 -- Filter percentage updated based on new distribution
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

After manually updating the histogram, the filtered percentage increased from 50.00 to 66.67, reflecting the optimizer's improved understanding of the new data distribution.

In MySQL 8.4, the need for manual updates, as demonstrated in step 6, has been eliminated. The histogram now automatically updates based on changes in the data distribution due to new insertions, updates, or deletions.

Adaptive Histograms

In MySQL 8.4, Adaptive Histograms were introduced to further optimize database performance, particularly with query execution. This feature builds upon regular histograms by not just updating based on data changes but also adapting dynamically to query patterns, allowing MySQL to adjust how it stores and analyzes data for faster execution.

Here's how Adaptive Histograms work:

Adaptive Histograms monitor the types of queries being executed on the database and adjust the histogram structure accordingly.
The database creates or adjusts buckets based on frequent query patterns. So if certain values are queried often, MySQL will create additional buckets to make those queries faster.
This allows MySQL to better estimate cardinality and make smarter decisions when selecting data for a query. It also helps the optimizer understand which data is accessed more frequently, resulting in more accurate query planning.

Key Features:

Query Awareness: Adaptive histograms track query patterns and adjust data distribution to improve performance over time.
Automatic Adjustment: It updates itself based on the usage patterns without requiring manual intervention.
New Buckets for Frequent Queries: Adaptive histograms can create new buckets specifically for values that are queried frequently, speeding up access to that data.

Example:

Imagine a scenario where you have a database table with sales data, and one of the columns is "Product_ID." Normally, the histogram would divide the product IDs into a set number of buckets. But with adaptive histograms, if certain products (like "Product_ID = 101") are queried much more frequently, MySQL will create more specific buckets for that product ID, ensuring faster retrieval for future queries.

Listen to Our Podcast on MySQL 8.4 Histograms

In this podcast, the key improvements in MySQL 8.4 are discussed, with a focus on the automatic updates of histograms and the introduction of adaptive histograms. These features can significantly enhance MySQL’s query optimization process, allowing DBAs to save time and improve efficiency.

Summary of Key Improvements in MySQL 8.4

With the release of MySQL 8.4, significant advancements have been made in the implementation and functionality of histograms, enhancing their utility for database performance optimization. These improvements address the limitations of previous versions, offering a more automated and adaptive approach to histogram management.

The following table summarizes the key improvements in histogram features from MySQL 8.0 to MySQL 8.4:

Feature	MySQL 8.0	MySQL 8.4 LTS Enhancements
Histogram Stats Updates	Manual	Automatic and adaptive
Update Mechanism	Static (requires re-analysis)	Dynamic (automatic updates based on query patterns)
Optimizer Integration	Basic	Improved cardinality estimation and query plan efficiency
Bucket Control	Manual bucket definition	More granular control with memory optimizations
Memory Usage	Limited optimizations	Optimized memory handling for large histograms
Partitioned Table Support	Basic	Enhanced support for partitioned columns and queries

The introduction of automatic histogram updates in MySQL 8.4 significantly enhances performance optimization. Database administrators no longer need to manually maintain histogram statistics, leading to more efficient query planning and execution without additional overhead. This automation simplifies data management and optimizes database performance, especially in dynamic environments where data is frequently updated.

At Mydbops, we offer expert MySQL Consulting and Managed Services to help you leverage the full potential of MySQL 8.4 and beyond. Whether you need guidance with upgrading, performance tuning, or ongoing database management, our team of experienced professionals is here to support your business every step of the way.

Contact us today to learn how we can optimize your MySQL environment and ensure peak performance for your database infrastructure.

No items found.