Introduction to MySQL Parallel query in AWS Aurora

Mydbops

Aug 28, 2021

Mins to Read

All

Illustation representing Introduction to MySQL Parallel query in AWS Aurora

Exploring Aurora Parallel Query: Boosting Performance for Analytical Workloads

‍

Aurora has a salient feature “Parallel query“, Which will be more beneficial for analytical workload environments.

Before going to deep dive on this particular feature, let us understand the basis of Aurora.

‍

Aurora Architecture

‍

Key feature

The key feature of Aurora is simple data synchronisation among the nodes. The sync latency will be too low when compared to RDS because the synchronisation is happening on storage volumes among the nodes. Also all the server will available in different zone, even when a zone goes down we can able to maintain will other server present in other zone with auto failure.

Auto healing volume, Each volume has it’s own backup volume. So if the volume got corrupted, the instance will up with the backup volume.

The writer node will support writes and reads, But the reader node supports only the reads

We can have up to 15 reader node, where as in RDS we can have only 5 replica server.

Automated failure , when a writer node goes down for any reason, the reader node will automatically promote as a writer node. And also we can set the priority for the reader node to become a writer, if failover occurs.

Automatic storage scaling, the storage will automatically get increase and will reduce back once the need if done.

Auto-scaling of reader node, when there is a load has been increased in server, the cluster will automatically create a new reader and will add to the cluster until the load reduce back to normal. Post that , newly created instance will be auto-terminated. We are handle this by custom endpoint in aurora as well, the newly created instance will be auto joined to the cluster endpoint for query routing.

So far we have seen what is the main feature of aurora, now let’s how it differ from native MySQL.

You can read our previous blog on Amazon Aurora by @kabilesh to know more about it features.

‍

Prominent Aurora only feature

In native MySQL, hash join is enabled in MySQL 8.0.18, But in Aurora has the hash join feature is available in MySQL 5.7 .

Parallel query is not available with native MySQL.

Storage autoscaling and reader auto scaling is not available in native MySQL.

‍

Parallel query

In the same server it won’t split the query. Instead of that, it will split the query among the cluster server and it will organise the data in the same server and will give the output.

When sharing the data among the network, it won’t send whole table instead of that it will send only the required value (based on the query condition) among the among the network.

‍

Use cases of Parallel Query

It works better when queries have RANGE or IN or EQUAL (=) conditions.

It is more optimized for analytical query, since it split the query among the nodes and then merge the result set.

Due the splitting of query among the nodes, the resource utilization is uniform among the nodes and it avoids heavy utilization on a single node.

‍

Prerequisites

To enable this parallel query feature, we need to satisfy some needs.

Aurora version should be greater than 2.09 or 1.23.

Instance type should be R-series, other instance types are not supported.

The tables should be non-partitioned, If any table has partition parallel query won’t support for that.

We need to hash join optimization for the same.

‍

Implementation

To enable this, we need to enable these two variables.

aurora_parallel_query = ON

aurora_disable_hash_join = OFF

Both of the variables are dynamic variables only, So we can enable it without any downtime. By default parallel query was disabled and aurora_disable_hash_join will be ON ( enabled ).

‍

Limitations

Of course, every feature has some limitations,

Parallel query won’t work if the row format is compressed, it will work with dynamic row format only.

It will work for all the tables, if the table size is smaller it will bypass the parallel query.

Concurrent parallel query depends on the instance type, i.e if the instance type is r3.large only one parallel query is supported, if the instance type is r3.xlarge only 2 parallel query is supported.