In this blog post, I am going to explain an interesting issue which I faced in one of our client project . Few weeks back , I got an requirement from our support client to construct a new MariaDB Galera Cluster ( 10.2.21 ) and an async slave with GTID. The Complete requirement is as below.
Need to construct the Galera cluster with 37 GB of data ( Using a given MySQL dump file )
Have to configure an asynchronous slave under the 3 node Galera cluster
The asynchronous slave was proposed with GTID to ease master node switchover.
So, as per the requirement the architecture will be,
Architecture –
How to achieve this ?
A high level steps
Install MariaDB 10.2.21 server on all four nodes ( 3 Galera node with wsrep_on = ON & 1 async slave with wsrep_on = OFF ).
Bring up ( join to primary node ) other two Galera nodes through SST
Configure the asynchronous slave ( node4 ) from one of the Galera node backup ( Here I planned to use the PXB streaming / but unfortunately it was not happened )
SST issues with PXB –
As per the plan, I had installed the MariaDB Server ( version –10.2.21-MariaDB-log ) in all four nodes . After the installation, I bootstrapped the first node as primary . Then successfully imported the MySQL dump file ( 37 GB ) into primary node.
Then kick started the MariaDB on node2 to stream the backup from bootstrapped node ( node1 ) through SST . I used xtrabackup-v2 as the SST method and the sample config on node2 is below
I have cleared the MySQL data directory on second node ( node2 ) and started the MariaDB server for SST .
[root@s2 mydbops]# service mysql stop
Stopping mysql (via systemctl): [ OK ]
[root@s2 mydbops]# less /etc/my.cnf | grep -i datadir
datadir = /var/lib/mysql
[root@s2 mydbops]# rm -rf /var/lib/mysql/*
[root@s2 mydbops]# service mysql start
But, the things were not going well as expected and node2 was keep failing on SST . Below is the error I found from the error logs .
2019-02-02 9:59:16 139728621905664 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():737: Will never receive state. Need to abort.
2019-02-02 9:59:16 139728621905664 [Note] WSREP: gcomm: terminating thread
2019-02-02 9:59:16 139728621905664 [Note] WSREP: gcomm: joining thread
2019-02-02 9:59:16 139728621905664 [Note] WSREP: gcomm: closing backend
WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20190202 09:59:16.743)
WSREP_SST: [ERROR] Cleanup after exit with status:2 (20190202 09:59:16.748)
2019-02-02 9:59:16 139728613512960 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.33.13' --datadir '/var/lib/mysql/' --parent '5644' --binlog 'mydbopslabs13-bin' '': 2 (No such file or directory)
2019-02-02 9:59:16 139728613512960 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2019-02-02 9:59:16 139728879081600 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2019-02-02 9:59:16 139728879081600 [ERROR] Aborting
2019-02-02 9:59:29 139716069972096 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used; This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=mydbopslabs13' or '--log-bin=mydbopslabs13-bin' to avoid this problem.
2019-02-02 9:59:29 139716069972096 [Note] WSREP: Read nil XID from storage engines, skipping position init
2019-02-02 9:59:29 139716069972096 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2019-02-02 9:59:29 139716069972096 [Note] WSREP: wsrep_load(): Galera 25.3.25(r3836) by Coders
I got confused about this behaviour , because my data directory was already empty and there is no port issue ( all the Galera ports were enabled ) and the selinux also disabled .
I kept on investigating this issue “WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script”
GTID implementation has major difference between MySQL & MariaDB ( Galera ) . i.e. wsrep_gtid_domain_idvariable is available only on MariaDB Galera clusters not in Percona XtraDB Cluster .
Percona Xtradb Cluster -
mysql> select @@version_comment, @@version\G
*************************** 1. row ***************************
@@version_comment: Percona XtraDB Cluster (GPL), Release rel18, Revision 4a4da7e, WSREP version 29.24, wsrep_29.24
@@version: 5.7.20-18-57-log
1 row in set (0.00 sec)
mysql> show global variables like 'wsrep_gtid_domain_id'\G
Empty set (0.01 sec)
MariaDB Galera -
root@localhost:(none)>select @@version_comment, @@version\G
*************************** 1. row ***************************
@@version_comment: MariaDB Server
@@version: 10.2.23-MariaDB-log
1 row in set (0.00 sec)
root@localhost:(none)>show global variables like 'wsrep_gtid_domain_id'\G
*************************** 1. row ***************************
Variable_name: wsrep_gtid_domain_id
Value: 0
1 row in set (0.00 sec)
As per the below error my SST script is getting failed without reading the uuid:seqno and wsrep_gtid_domain_id .
2019-02-02 9:59:16 139728613512960 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script. 2019-02-02 9:59:16 139728879081600 [ERROR] WSREP: SST failed: 2 (No such file or directory) 2019-02-02 9:59:16 139728879081600 [ERROR] Aborting
I started to look into the SST scripts , is the SST scripts are really capable for get those details ?
with Percona xtrabackup-v2 SST script -
[root@s2 mydbops]# which wsrep_sst_xtrabackup-v2
/bin/wsrep_sst_xtrabackup-v2
[root@s2 mydbops]# less /bin/wsrep_sst_xtrabackup-v2 | grep -i wsrep_gtid
[root@s2 mydbops]#
with mariabackup SST script -
[root@s2 mydbops]# which wsrep_sst_mariabackup
/usr/bin/wsrep_sst_mariabackup
[root@s2 mydbops]# less /usr/bin/wsrep_sst_mariabackup | grep -i wsrep_gtid
# Store donor's wsrep GTID (state ID) and wsrep_gtid_domain_id
# Store donor's wsrep GTID (state ID) and wsrep_gtid_domain_id
cat "${MAGIC_FILE}" # Output : UUID:seqno wsrep_gtid_domain_id
[root@s2 mydbops]#
Yes, only Mariabackup SST script has that . I finally identified that I had the problem with my SST method . Below I am sharing more information from Mariabackup SST script .
if [[ ! -r ${MAGIC_FILE} ]];then
wsrep_log_error "SST magic file ${MAGIC_FILE} not found/readable"
exit 2
fi
wsrep_log_info "Galera co-ords from recovery: $(cat ${MAGIC_FILE})"
cat "${MAGIC_FILE}" # Output : UUID:seqno wsrep_gtid_domain_id
wsrep_log_info "Total time on joiner: $totime seconds"
Finally I was able to fix the issue by changing the SST method from Percona xtrabackup-v2 to Mariabackup .
MySQL config file -
wsrep_sst_method = mariabackup
SST logs with Mariabackup -
WSREP_SST: [INFO] Proceeding with SST (20190202 10:25:48.760) WSREP_SST: [INFO] Waiting for SST streaming to complete! (20190202 10:25:48.804)
2019-02-02 10:26:02 140653493200640 [Note] WSREP: 1.0 (n1): State transfer to 0.0 (n2) complete.
2019-02-02 10:26:03 140653493200640 [Note] WSREP: Member 1.0 (n1) synced with group. WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20190202 10:26:03.163) &>${DATA}/innobackup.move.log (20190202 10:26:03.922) WSREP_SST: [INFO] Removing the sst_in_progress file (20190202 10:26:04.125)
2019-02-02 10:26:04 140653751167104 [Note] WSREP: SST complete, seqno: 3
Finally the SST went smoothly and my cluster is in sync and successfully constructed the async slave as well using Mariabackup .
Additionally ,
Percona Xtrabackup can be used with MariaDB 10.2.x version but it is not compatible with 10.3 and higher versions . The MariaDB Official Page explains the limitation of PXC with MariaDB.
Beware of the SST methods in MariaDB and make a right choice of your SST. PXB cant be good choice for SST method with MariaDB as the features deviates in the way it is designed between MySQL and MariaDB . Mariabackup will be the right SST method with MariaDB cluster from 10.2 and higher releases.