Percona XtraDB Cluster: Rejoining a Rejected Node with Incremental State Transfer (IST)

XtraDB Cluster troubleshooting a rejected node.

Jul 11, 2020

Mins to Read

All

PXC(Percona XtraDB Cluster), aka Galera cluster, is one of my favorite clustering technology for MySQL that comes with real Multi-master capabilities, Write-set based nearly real-time replication and many more features.

I could proudly say that Mydbops is one of the very few companies who have been supporting mission-critical clusters and preaching PXC ie., Galera cluster for the last 4+ years. Here are few of our talks, presentations, and blogs on PXC

In this blog am going to share my recent experience in troubleshooting the Percona XtarDB Cluster(Galera).

We had a request from the client to do a rolling shutdown of cluster for a hardware maintenance. This is a large and busy cluster of size 4 with a dataset of size 3.3TB running on version PXC-5.6.46, and ProxySQL being used as a balancer for handling more than 60K QPS running in single-writer mode.

As requested we had brought down MySQL cleanly, and then the node was taken for maintenance., which lasted for ~1:30 hrs ( 90 Min )

 
2020-04-14 04:37:45 3300 [Note] Shutting down plugin 'binlog'
2020-04-14 04:37:45 3300 [Note] /usr/sbin/mysqld: Shutdown complete

Having an IST(Incremental State Transfer) after maintenance is an easy and fastest way to join back the node to cluster rather than SST (State Snapshot Transfer) which is a full-data copy.

In order to have IST, we have calculated and set Gcache aka ring-buffer file to 16GB to hold write set for at least 3hrs

When the nodes come back online, as expected it was going for an IST as below

 
2020-04-14 05:05:04 4506 [Note] WSREP: IST receiver addr using tcp://100.11.20.151:4568
2020-04-14 05:05:04 4506 [Note] WSREP: GCache history reset: eaad8819-e6a3-11e6-84d2-3e87f6331c69:77346382323 -> eaad8819-e6a3-11e6-84d2-3e87f6331c69:77347108222
2020-04-14 05:07:21 4506 [Note] WSREP: SST received: eaad8819-e6a3-11e6-84d2-3e87f6331c69:77346382323
2020-04-14 05:07:21 4506 [Note] WSREP: Receiving IST: 725899 writesets, seqnos 77346382323-77347108222
2020-04-14 05:07:21 4506 [Note] WSREP: Receiving IST... 0.0% ( 0/725899 events) complete.

As you can see the local state as 77346382323 ie., state of this node and the group state as 77347108222 ie., current cluster state. The missing write sets “725899” has to be received via IST. The node should have joined the cluster automatically post receiving those writeset’s, but it doesn’t.

While checking on the error log we got the below note stating the node has been rejected to join the cluster stating there is a change in the group state comparing the local state of the node. This might happen on edge-case scenarios due to high concurrency of writes.

 
2020-04-14 05:08:42 4506 [Note] WSREP: State transfer required:
    Group state: eaad8819-e6a3-11e6-84d2-3e87f6331c69:77347116858
    Local state: eaad8819-e6a3-11e6-84d2-3e87f6331c69:77347109141
2020-04-14 05:08:42 4506 [Note] WSREP: New cluster view: global state: eaad8819-e6a3-11e6-84d2-3e87f6331c69:77347116858, view# 562: Primary, number of nodes: 4, my index: 1, protocol version 3
2020-04-14 05:08:56 4311 [Warning] WSREP: Rejecting JOIN message from 0.0 (node3): new State Transfer required.
2020-04-14 05:08:56 4506 [Note] WSREP: Setting wsrep_ready to false

“Don’t ever attempt to restart/start MySQL” at this point, since it will go for a time-consuming SST process to sync the data.

Reason for SST Trigger:

Since the “GRASTATE.dat” file has been reset ( shown below) after the failure with initial IST the cluster will trigger an SST

 
# GALERA saved state version: 2.1 
uuid: 00000-000000000000-00000000 
seqno: -1 
safe_to_bootstrap: 0

What is grastate.dat file?

This file is located under the data-dir of each node in the cluster, maintains cluster group UUID, and local state seq no. From Galera 3.19 additional protection for bootstrap “safe_to_bootstrap” flag has been added.

Grastate.dat file also helps in identifying the most advanced node in the cluster for bootstrap after a complete outage w.r.t all the nodes in the cluster. (Ex) in case of a power outage

How to join this node with IST?

Again as you can see the local state seq no as 77347109141 ie the node is consistent till this point, and the group state has an advanced seq no as 77347116858, ie the cluster has moved forward with the write sets.

Local State : 77347109141

Group State : 77347116858

Now we can proceed to induce an IST transfer by manually updating local state seq.no in “grastate.dat”.

Now let’s proceed to update the cluster UUID(if overwritten) and Seq no to 77347109141, and start the cluster. The node will now proceed to read the seq no from file and proceed to get the IST as below,

I have edited the grastate.dat file as below

 
# GALERA saved state
version: 2.1
uuid: ea5d8819-e8a3-11e6-84d2-3e85f6331c89
seqno: 77347109141
safe_to_bootstrap: 0

now proceed to start the cluster node

 
#service mysql start

What happens during the Cluster start ?

Step 1: Grastet.dat file has been read

When the node joins the cluster, it reads its state from the “grastate.dat” file

 
2020-04-14 05:21:00 12030 [Note] WSREP: Found saved state: ea5d8819-e8a3-11e6-84d2-3e85f6331c89:77347109141, safe_to_bootstrap: 0
2020-04-14 05:21:00 12030 [Note] WSREP: Skipped GCache ring buffer recovery: could not determine history UUID.

Step2: Assign and update the UUID and Seq number

 
2020-04-14 05:21:00 12030 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> ea5d8819-e8a3-11e6-84d2-3e85f6331c89:77347109141
2020-04-14 05:21:00 12030 [Note] WSREP: Assign initial position for certification: 77347109141, protocol version: -1

Step3: Calculate the write-set to be applied

Calculates the write set difference with the local state and current group state, and receives it from the Gcache via IST.

 
2020-04-14 05:21:08 12030 [Note] WSREP: SST received: ea5d8819-e8a3-11e6-84d2-3e85f6331c89:77347109141
2020-04-14 05:21:08 12030 [Note] WSREP: Receiving IST: 452991 writesets, seqnos 77347109141-77347562132
2020-04-14 05:21:08 12030 [Note] WSREP: Receiving IST... 0.0% ( 0/452991 events) complete.

Step4: Apply of IST and node joining

Apply the received write sets in the Order of transaction .

 
2020-04-14 05:21:08 12030 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.46-86.2-56-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), Release rel86.2, Revision 5c5f58d, WSREP version 28.38, wsrep_28.38
2020-04-14 05:21:19 12030 [Note] WSREP: Receiving IST... 19.7% ( 89440/452991 events) complete.
2020-04-14 05:21:29 12030 [Note] WSREP: Receiving IST... 36.7% (166064/452991 events) complete.
2020-04-14 05:21:39 12030 [Note] WSREP: Receiving IST... 60.8% (275392/452991 events) complete.
2020-04-14 05:21:49 12030 [Note] WSREP: Receiving IST... 83.2% (376688/452991 events) complete.
2020-04-14 05:21:57 12030 [Note] WSREP: Receiving IST...100.0% (452991/452991 events) complete.
2020-04-14 05:21:57 12030 [Note] WSREP: IST received: ea5d8819-e8a3-11e6-84d2-3e85f6331c89:77347562132
2020-04-14 05:21:57 12030 [Note] WSREP: 2.0 (node1): State transfer from 3.0 (node2) complete.
2020-04-14 05:21:57 12030 [Note] WSREP: Shifting JOINER -> JOINED (TO: 77347588293)
2020-04-14 05:21:58 12030 [Note] WSREP: Member 2.0 (node1) synced with group.
2020-04-14 05:21:58 12030 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 77347588752)
2020-04-14 05:21:59 12030 [Note] WSREP: Synchronized with group, ready for connections

The node has joined back to cluster as expected by re-initiating IST and the node is now ready for connections.It saved my day from a full SST ( rebuild ). I hope this blog will provide solutions to many who face a similar issue.

Happy Troubleshooting!!!!!

Facing a rejected node in your PXC cluster? Let Mydbops experts help! Contact us today for a consultation and ensure smooth database operations.

No items found.