ML Learning series Day 0: Introduction to Machine Learning concepts.
MySQL’s group replication is a powerful feature that enables a cluster of servers to function as a single entity, ensuring high availability and fault tolerance. Within a group replication setup, each server replicates data to one another, allowing for redundancy and distributed workload management.
One of the key components of a group replication arrangement is the primary node, which serves as the primary source for data updates and manages the replication process. In the event that the primary node becomes unavailable, it’s crucial to switch to a new primary node to maintain the continuity of data replication and minimize downtime.
In this blog post, I’ll explore the process of promoting a subsidiary node to the primary node in a MySQL group replication setup.
Common reasons for switching the primary node
There are various situations where it may be necessary to switch the primary node in a MySQL group replication setup. Here are some of the typical reasons for this behavior:
- Performing OS or MySQL maintenance tasks that require the primary node to be offline temporarily.
- Applying patches or updates to the host system that may impact the performance or availability of the primary node.
- Upgrading the MySQL server to a new version or making significant changes to the configuration that require a new primary node.
- The current primary server experiences an issue or goes down unexpectedly, making it necessary to promote a new primary node to maintain data replication and minimize downtime.
Primary Promotion and its importance
In the event of a primary node failure, it’s important to select a new primary node to ensure the continuity of data replication and minimize downtime. This process is known as primary promotion.
With primary promotion, the group replication cluster can continue to function and handle requests even when a primary node fails.
Methods for switching the primary node
There are several methods available for switching the primary node in MySQL group replication, including:
- Auto-switch
- Auto-switch by weight of the node
- Manual Primary Promotion
Setting up a three-node MySQL group replication cluster for demonstration
To illustrate the primary promotion process and its different scenarios, I have set up a practical example using a three-node MySQL group replication cluster. This setup allows us to demonstrate how the primary node can be switched in different situations, such as node failures or maintenance tasks.
Auto-switch method and its implementation
Auto-switch method: A built-in feature that automatically promotes a secondary node to primary when the existing primary fails.
When the current primary node fails, a secondary node is immediately promoted to become the new primary node, based on predefined criteria such as the number of votes, uptime, and performance metrics.
If all group members are running MySQL 8.0.17 or higher, members are sorted by the patch version of their release. Otherwise, they are sorted by the major version of their release, with the patch version being disregarded for MySQL Server 5.7 or MySQL 8.0.16 or lower.
If the servers have the same weight or are running a lower version, the lexicographical order of the generated server UUIDs of each member is used to choose the primary node. The member with the lowest server UUID is selected as the primary node.
In our three-node MySQL group replication cluster setup, all nodes are in sync and capable of serving as the primary node. To gain a better understanding of the status of each node, we can utilize the MEMBER_STATE parameter.
mydbops@localhost:(none)>SELECT MEMBER_ID,MEMBER_HOST,MEMBER_PORT,MEMBER_STATE,MEMBER_ROLE FROM performance_schema.replication_group_members;
+--------------------------------------+------------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+------------------+-------------+--------------+-------------+
| 07338a21-a0c2-11ed-b447-126bbf31abb9 | Node - 2 | 3306 | ONLINE | SECONDARY |
| 122592a9-a0c2-11ed-a451-12e73c711187 | Node - 3 | 3306 | ONLINE | SECONDARY |
| 9e470ac3-a0c1-11ed-b594-1253971a943d | Node - 1 | 3306 | ONLINE | PRIMARY |
+--------------------------------------+------------------+-------------+--------------+-------------+
3 rowsinset (0.01 sec)
Below is the server_uuid for the primary node (Node 1).
mydbops@localhost:(
none
)>
show
global
variables
like
'server_uuid%';
+---------------+--------------------------------------+
| Variable_name |Value
|
+---------------+--------------------------------------+
| server_uuid | 9e470ac3-a0c1-11ed-b594-1253971a943d |
+---------------+--------------------------------------+
1 rowin
set
(0.00 sec)
To initiate automatic failover, I analyzed the status of the group replication and stopped it on Node 1. As a result, the current primary node, Node 1, went down.
mydbops@localhost:(
none
)>stop group_replication;
Query OK, 0rows
affected (3.07 sec)
mydbops@localhost:(none
)>
SELECT
MEMBER_ID,MEMBER_HOST,MEMBER_PORT,MEMBER_STATE,MEMBER_ROLE
FROM
performance_schema.replication_group_members;
+--------------------------------------+------------------+-------------+--------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+--------------------------------------+------------------+-------------+--------------+
| 9e470ac3-a0c1-11ed-b594-1253971a943d | Node - 1 | 3306 | OFFLINE |
+--------------------------------------+------------------+-------------+--------------+
1 rowin
set
(0.00 sec)
After restarting Node 1, it will be added back to the cluster as a secondary node.
Auto-switch by weight of the node method and its implementation
Auto-switch by weight of the node method: A more sophisticated version of auto-switch that takes into account the performance and capacity of the nodes to determine which one should be promoted as the new primary.
The group_replication_member_weight system variable sets a weight value between 0 and 100 for each member of the cluster. A higher weight value improves the member’s ranking, while a lower weight value makes its ranking worse. By default, each cluster member has a weight value of 50.
In this scenario, the weight of Node 3 has been increased from the default weight value of 50 to 100.
mydbops@localhost:(
none
)>
show
global
variables
like
'server_uuid%';
+---------------+--------------------------------------+
| Variable_name |Value
|
+---------------+--------------------------------------+
| server_uuid | 122592a9-a0c2-11ed-a451-12e73c711187 |
+---------------+--------------------------------------+
1 rowin
set
(0.00 sec)
mydbops@localhost:(none
)>
show
global
variables
like
'group_replication_member_weight';
+---------------------------------+-------+
| Variable_name |Value
|
+---------------------------------+-------+
| group_replication_member_weight | 50 |
+---------------------------------+-------+
1 rowin
set
(0.00 sec)
mydbops@localhost:(none
)>
set
global
group_replication_member_weight=100;
Query OK, 0rows
affected (0.00 sec)
mydbops@localhost:(none
)>
show
global
variables
like
'group_replication_member_weight';
+---------------------------------+-------+
| Variable_name |Value
|
+---------------------------------+-------+
| group_replication_member_weight | 100 |
+---------------------------------+-------+
1 rowin
set
(0.00 sec)
I halted the group replication on the current primary node, which is Node 2.
mydbops@localhost:(
none
)>stop group_replication;
Query OK, 0rows
affected (3.08 sec)
mydbops@localhost:(none
)>
SELECT
MEMBER_ID,MEMBER_HOST,MEMBER_PORT,MEMBER_STATE,MEMBER_ROLE
FROM
performance_schema.replication_group_members;
+--------------------------------------+------------------+-------------+--------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+--------------------------------------+------------------+-------------+--------------+
| 07338a21-a0c2-11ed-b447-126bbf31abb9 | Node - 2 | 3306 | OFFLINE |
+--------------------------------------+------------------+-------------+--------------+
1 rowin
set
(0.00 sec)
As Node 2, the current primary server, went down, the next server with a higher weight, Node 3, was promoted to become the new primary server.
mydbops@localhost:(
none
)>
SELECT
MEMBER_ID,MEMBER_HOST,MEMBER_PORT,MEMBER_STATE,MEMBER_ROLE
FROM
performance_schema.replication_group_members;
+--------------------------------------+------------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+------------------+-------------+--------------+-------------+
| 122592a9-a0c2-11ed-a451-12e73c711187 | Node - 3 | 3306 | ONLINE |PRIMARY
|
| 9e470ac3-a0c1-11ed-b594-1253971a943d | Node - 1 | 3306 | ONLINE | SECONDARY |
+--------------------------------------+------------------+-------------+--------------+-------------+
2rows
in
set
(0.00 sec)
Manual Primary Promotion method and its implementation
Manual Primary Promotion method: A method that requires manual intervention to promote a secondary node as the new primary. Once the new primary node is promoted, it will not be demoted back to a secondary node, even if the original primary node becomes available again. This approach gives more control to the database administrators and can be useful in certain scenarios where specific nodes need to be prioritized.
Currently, all the nodes are online, Node 3 is the current primary server.
mydbops@localhost:(
none
)>
SELECT
MEMBER_ID,MEMBER_HOST,MEMBER_PORT,MEMBER_STATE,MEMBER_ROLE
FROM
performance_schema.replication_group_members;
+--------------------------------------+------------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+------------------+-------------+--------------+-------------+
| 07338a21-a0c2-11ed-b447-126bbf31abb9 | Node - 2 | 3306 | ONLINE | SECONDARY |
| 122592a9-a0c2-11ed-a451-12e73c711187 | Node - 3 | 3306 | ONLINE |PRIMARY
|
| 9e470ac3-a0c1-11ed-b594-1253971a943d | Node - 1 | 3306 | ONLINE | SECONDARY |
+--------------------------------------+------------------+-------------+--------------+-------------+
3rows
in
set
(0.00 sec)
I have fetched the server_uuid of node 1, which will be promoted as the new primary server.
mydbops@localhost:(
none
)>
show
global
variables
like
'server_uuid%';
+---------------+--------------------------------------+
| Variable_name |Value
|
+---------------+--------------------------------------+
| server_uuid | 9e470ac3-a0c1-11ed-b594-1253971a943d |
+---------------+--------------------------------------+
1 rowin
set
(0.00 sec)
I promoted Node 1 using its server_uuid with the following command:
mydbops@localhost:(
none
)>
select
group_replication_set_as_primary('9e470ac3-a0c1-11ed-b594-1253971a943d');
+--------------------------------------------------------------------------+
| group_replication_set_as_primary('9e470ac3-a0c1-11ed-b594-1253971a943d') |
+--------------------------------------------------------------------------+
|Primary
server switched
to
: 9e470ac3-a0c1-11ed-b594-1253971a943d |
+--------------------------------------------------------------------------+
1 rowin
set
(0.01 sec)
mydbops@localhost:(none)>SELECT MEMBER_ID,MEMBER_HOST,MEMBER_PORT,MEMBER_STATE,MEMBER_ROLE FROM performance_schema.replication_group_members;
+--------------------------------------+------------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+------------------+-------------+--------------+-------------+
| 07338a21-a0c2-11ed-b447-126bbf31abb9 | Node - 2 | 3306 | ONLINE | SECONDARY |
| 122592a9-a0c2-11ed-a451-12e73c711187 | Node - 3 | 3306 | ONLINE | SECONDARY |
| 9e470ac3-a0c1-11ed-b594-1253971a943d | Node - 1 | 3306 | ONLINE | PRIMARY |
+--------------------------------------+------------------+-------------+--------------+-------------+
3 rowsinset (0.00 sec)
Based on the methods discussed above, we can securely promote another node to act as the primary node in group replication.
Promoting another node as the primary node in group replication is a safe and effective method for ensuring high availability and fault tolerance in MySQL clusters. The selection of a promotion method depends on various factors such as the cluster’s size and complexity, the skills of the administrators, and the application’s specific requirements. Know more information on the limitations of group replication.
If you found this blog post insightful, we invite you to check out our other articles on our website here. We cover a variety of topics related to database consulting and support, as well as other tech-related subjects. You might find something that interests you and provides valuable insights and tips for your business or career. Check out our recent blog Prevent ProxySQL from directing traffic to a broken MySQL replica.
Optimize your MySQL Group Replication setup with Mydbops' InnoDB Cluster Consulting Services. Our experts can help you streamline your database management and improve performance. Contact us today to discuss your specific needs.
{{cta}}