Disaster averted! Node failure throws a wrench in your Percona cluster's replication? Don't fret! This blog post unveils a 4-step method to swiftly switch your non-GTID slave to a new master, minimizing downtime and ensuring business continuity.
Recently i worked on a production issue for one of our client under support .They have a architecture of a three node Galera cluster with one asynchronous slave .
Node1 – 172.10.2.11
Node2 – 172.10.2.12
Node3 – 172.10.2.13
Replica – 172.10.2.14
Architecture
The slave(replica) was configured with node3 as replica master. Unfortunately the node 3 was crashed with an OOM killer ,also server has a low gcache size, so when i am trying to start the node 3 , it went to SST . Here the data size was around 2.6 TB , in general for completion of whole SST and joining the node back to cluster will take around approximately 12 hours.
As i told earlier, the replication slave was under node3 and all reporting applications were pointed to async slave only .So, I can’t wait upto 12 hours as it will affect my entire client reporting environment.
To overcome this scenario, i had planned to switch my async slave under node 2 ( 172.10.2.12 ) . By this blog post, i am going to explain the steps how i was able to achieve this .
Step 1:
Stop the slave server for getting persistent log file and position in async slave.
After switch over current master for replica is node 2 (172.20.4.12):
Conclusion:
From the four simple steps i was able to get back the slave and make it production ready, which reduces the impact of downtime of a switch with out GTID.
From replication to performance optimization, Mydbops empowers you to streamline tasks and ensure database health.