PXC(Percona XtraDB Cluster), aka Galera cluster, is one of my favorite clustering technology for MySQL that comes with real Multi-master capabilities, Write-set based nearly real-time replication and many more features.
I could proudly say that Mydbops is one of the very few companies who have been supporting mission-critical clusters and preaching PXC ie., Galera cluster for the last 4+ years. Here are few of our talks, presentations, and blogs on PXC
- Percona Xtradb cluster (Presentation)
- Handling long duration SST(timeout) in PXC with systemd
- New GTID functions in Galera 4
- Percona Cluster/MariaDB Cluster (Galera) Read-write Split
- Fulfilled Tablespace Encryption (TDE) in Percona Cluster
- Maxscale for PXC
In this blog am going to share my recent experience in troubleshooting the Percona XtarDB Cluster(Galera).
We had a request from the client to do a rolling shutdown of cluster for a hardware maintenance. This is a large and busy cluster of size 4 with a dataset of size 3.3TB running on version PXC-5.6.46, and ProxySQL being used as a balancer for handling more than 60K QPS running in single-writer mode.
As requested we had brought down MySQL cleanly, and then the node was taken for maintenance., which lasted for ~1:30 hrs ( 90 Min )
Having an IST(Incremental State Transfer) after maintenance is an easy and fastest way to join back the node to cluster rather than SST (State Snapshot Transfer) which is a full-data copy.
In order to have IST, we have calculated and set Gcache aka ring-buffer file to 16GB to hold write set for at least 3hrs
When the nodes come back online, as expected it was going for an IST as below
As you can see the local state as 77346382323 ie., state of this node and the group state as 77347108222 ie., current cluster state. The missing write sets “725899” has to be received via IST. The node should have joined the cluster automatically post receiving those writeset’s, but it doesn’t.
While checking on the error log we got the below note stating the node has been rejected to join the cluster stating there is a change in the group state comparing the local state of the node. This might happen on edge-case scenarios due to high concurrency of writes.
“Don’t ever attempt to restart/start MySQL” at this point, since it will go for a time-consuming SST process to sync the data.
Reason for SST Trigger:
Since the “GRASTATE.dat” file has been reset ( shown below) after the failure with initial IST the cluster will trigger an SST
What is grastate.dat file?
This file is located under the data-dir of each node in the cluster, maintains cluster group UUID, and local state seq no. From Galera 3.19 additional protection for bootstrap “safe_to_bootstrap” flag has been added.
Grastate.dat file also helps in identifying the most advanced node in the cluster for bootstrap after a complete outage w.r.t all the nodes in the cluster. (Ex) in case of a power outage
How to join this node with IST?
Again as you can see the local state seq no as 77347109141 ie the node is consistent till this point, and the group state has an advanced seq no as 77347116858, ie the cluster has moved forward with the write sets.
Local State : 77347109141
Group State : 77347116858
Now we can proceed to induce an IST transfer by manually updating local state seq.no in “grastate.dat”.
Now let’s proceed to update the cluster UUID(if overwritten) and Seq no to 77347109141, and start the cluster. The node will now proceed to read the seq no from file and proceed to get the IST as below,
I have edited the grastate.dat file as below
now proceed to start the cluster node
What happens during the Cluster start ?
Step 1: Grastet.dat file has been read
When the node joins the cluster, it reads its state from the “grastate.dat” file
Step2: Assign and update the UUID and Seq number
Step3: Calculate the write-set to be applied
Calculates the write set difference with the local state and current group state, and receives it from the Gcache via IST.
Step4: Apply of IST and node joining
Apply the received write sets in the Order of transaction .
The node has joined back to cluster as expected by re-initiating IST and the node is now ready for connections.It saved my day from a full SST ( rebuild ). I hope this blog will provide solutions to many who face a similar issue.
Happy Troubleshooting!!!!!
Facing a rejected node in your PXC cluster? Let Mydbops experts help! Contact us today for a consultation and ensure smooth database operations.
{{cta}}