Aurora MySQL Parallel query Performance and its gains

Oct 14, 2021

Mins to Read

All

As a continuation to my previous blog , I have made benchmarking to find the performance improvement using parallel query.Before starting the testing, we will go through the monitoring stats variables and query analysis (explain).

Monitoring Aurora Parallel Query stats

The above variables are status variables, many more variables are there for monitoring. Kindly have a look on. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-mysql-parallel-query.html#aurora-mysql-parallel-query-monitoring

Performance analysis

Lab Environment :

‍

Explain plan

By using the explain plain of the query we can able to see whether query is using parallel query or not.

Without parallel query

 
mysql>  explain select count(id) from sbtest1 where id > 12345 and k < 6789;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref  | rows      | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
|  1 | SIMPLE      | sbtest1 | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 654698147 |    33.33 | Using where |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

With parallel query

 
mysql>  explain select count(id) from sbtest1 where id > 12345 and k < 6789;
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows       | filtered | Extra                                                                      |
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------------------+
|  1 | SIMPLE      | sbtest1 | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL | 1309396294 |    16.66 | Using where; Using parallel query (2 columns, 2 filters, 0 exprs; 0 extra) |
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

Observation in Extra ( Explain )

In explain plain, extra column we can see there are 4 options along with that. We will see one by one explanation of it.

Using parallel query (2 columns, 2 filters, 0 exprs; 0 extra)

‍

Testing

Instead of testing only with single format, we are going to test the parallel query with different WHERE clause and conditions.

Using function

Without parallel query

 
mysql> select sql_no_cache sum(k) from sbtest1 where upper(k)=231212 and upper(c) is not null;
+--------+
| sum(k) |
+--------+
|   NULL |
+--------+
1 row in set (2 hours 33 min 22.40 sec)

Without parallel Query there is a spike in system resources persists for a prolonged period.

With parallel query

 
mysql> select sql_no_cache sum(k) from sbtest1 where upper(k)=231212 and upper(c) is not null;
+--------+
| sum(k) |
+--------+
|   NULL |
+--------+
1 row in set (1 min 6.61 sec)

From the above stats, we can able to understand from the above stats that without parallel query the CPU utilisation was in complete saturation state and also the read latency of the server got increased due to the query. And the query execution took around 2 hours 30 mins to complete.

We can see there is a dramatic change post enabling the parallel query. The execution just took 1 min to complete as well as the resource utilisation was less, CPU just spike to 70% to process the query and the latency is also less.

Using Eq Ref

Without parallel query

 
mysql> select sql_no_cache count(*) from sbtest1 where k=7256238746;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (2 hours 25 min 9.11 sec)

With parallel query

 
mysql> select count(*) from sbtest1 where k=7256238746;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (2 min 12.22 sec)

Using Join Condition

Without parallel query

 
mysql> select sql_no_cache count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423;
+-------------+
| count(t1.k) |
+-------------+
|           0 |
+-------------+
1 row in set (10 min 57.72 sec)

With parallel query

 
mysql> select count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423;
+-------------+
| count(t1.k) |
+-------------+
|           0 |
+-------------+
1 row in set (2 min 38.48 sec)

Performance improvement

We have done a testing based on different pattern, now it’s time to consolidate the performance report.

By the stats we can summarise that parallel query can be more gainful for MySQL analytical workload with better performance. We have gained around 150X performance in a few cases.

No items found.