Observability in MySQL Group Replication Monitoring with Performance_schema

Mydbops
Mar 7, 2024
15
Mins to Read
All

Spot Hiccups Before They Happen! In the context of Group Replication, a robust monitoring strategy is indispensable to ensure the health and performance of individual servers and the entire cluster. This blog post takes you on an exhaustive journey into server monitoring, leveraging the full potential of the Performance Schema, and unveiling a comprehensive array of additional monitoring tools at your disposal.

The Essentiality of Monitoring

Data integrity is non-negotiable. MySQL Group Replication Monitoring ensures that information remains consistent across all nodes, providing a robust foundation for applications. It acts as the guardian angel of your database, vigilantly watching day and night to spot and resolve any discrepancies. With monitoring in place, you can rest easy knowing your data is in safe hands, ready to fuel your operations without a hitch. Therefore, it's crucial to elevate the MySQL experience and make monitoring a cornerstone.

The Importance of Monitoring Group Replication

Monitoring plays a crucial role for several reasons:

  • It can alert you to any inconsistencies or delays in replication, allowing for prompt corrective action.
  • It allows for proactive maintenance, helping to avoid unexpected downtime.
  • It keeps a check on each replication group node's performance, providing valuable insights.
  • It offers insights into the replication group's state, facilitating the rapid identification of root causes for any issues.
  • It enables historical data analysis, allowing for trend analysis and the identification of long-term performance patterns or potential issues.

Monitoring Strategies

Monitoring strategies for MySQL Group Replication include utilizing the Performance Schema and Grafana. These tools provide in-depth insights and visualization of the replication process, enabling proactive monitoring and management of your replication cluster.

Performance_schema

We can perform group replication status monitoring directly from the command line by extracting information from the Performance Schema. This valuable data source offers in-depth insights into the Group Replication setup and its real-time status, empowering us to effectively track and manage our replication cluster.

Observability in MySQL Group Replication Monitoring with Performance_schema
MySQL Group Replication: Performance Schema

Configuration

For insights into the fundamental configuration of MySQL Group Replication, a key resource to consult is the performance_schema.replication_connection_configuration table. It offers a comprehensive overview of the configuration settings and behavior that underpin our Group Replication setup.

 
mysql> select CHANNEL_NAME,SSL_ALLOWED,SSL_VERIFY_SERVER_CERTIFICATE,CONNECTION_RETRY_INTERVAL,CONNECTION_RETRY_COUNT,HEARTBEAT_INTERVAL,COMPRESSION_ALGORITHM,GTID_ONLY from performance_schema.replication_connection_configuration;
+---------------------------+-------------+-------------------------------+---------------------------+------------------------+--------------------+-----------------------+-----------+
| CHANNEL_NAME              | SSL_ALLOWED | SSL_VERIFY_SERVER_CERTIFICATE | CONNECTION_RETRY_INTERVAL | CONNECTION_RETRY_COUNT | HEARTBEAT_INTERVAL | COMPRESSION_ALGORITHM | GTID_ONLY |
+---------------------------+-------------+-------------------------------+---------------------------+------------------------+--------------------+-----------------------+-----------+
| group_replication_applier | NO          | NO                            |                        60 |                  86400 |             30.000 | uncompressed          | 1         |
+---------------------------+-------------+-------------------------------+---------------------------+------------------------+--------------------+-----------------------+-----------+
1 row in set (0.00 sec)
	

Transaction time

By querying the performance_schema.replication_connection_status table, we can access valuable insights into the committed and retrieved transaction timestamps, enabling us to precisely gauge the latency and delays within the replication process.

 
mysql> select CHANNEL_NAME,SOURCE_UUID,SERVICE_STATE,RECEIVED_TRANSACTION_SET,LAST_QUEUED_TRANSACTION,LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP,LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP,LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP,QUEUEING_TRANSACTION from performance_schema.replication_connection_status;
+----------------------------+--------------------------------------+---------------+------------------------------------------------+-------------------------------------------+---------------------------------------------------+----------------------------------------------------+-----------------------------------------------+---------------------------------------------+-------------------------------------------+
| CHANNEL_NAME               | SOURCE_UUID                          | SERVICE_STATE | RECEIVED_TRANSACTION_SET                       | LAST_QUEUED_TRANSACTION                   | LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP | LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP | LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP | LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP | QUEUEING_TRANSACTION                      |
+----------------------------+--------------------------------------+---------------+------------------------------------------------+-------------------------------------------+---------------------------------------------------+----------------------------------------------------+-----------------------------------------------+---------------------------------------------+-------------------------------------------+
| group_replication_applier  | b8192f4b-a054-32d6-ac0b-88b32af3ba1d | ON            | b8192f4b-a054-32d6-ac0b-88b32af3ba1d:1-4451    | b8192f4b-a054-32d6-ac0b-88b32af3ba1d:4124 | 2023-10-23 17:42:34.208890                        | 2023-10-23 17:42:34.208890                         | 2023-10-23 17:42:34.632100                    | 2023-10-23 17:42:34.633476                  |                                           |
| group_replication_recovery | 96bfd3d5-6b68-11ee-a925-06e6380e502d | ON            | b8192f4b-a054-32d6-ac0b-88b32af3ba1d:4125-7112 | b8192f4b-a054-32d6-ac0b-88b32af3ba1d:7112 | 2023-10-23 17:46:46.671429                        | 2023-10-23 17:46:46.680319                         | 2023-10-23 17:46:56.509062                    | 2023-10-23 17:46:56.512040                  | b8192f4b-a054-32d6-ac0b-88b32af3ba1d:7113 |
+----------------------------+--------------------------------------+---------------+------------------------------------------------+-------------------------------------------+---------------------------------------------------+----------------------------------------------------+-----------------------------------------------+---------------------------------------------+-------------------------------------------+
2 rows in set (0.00 sec)
	

By comparing QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP and QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP, we can discern the timestamp of the transaction committed on both the source server and the current server. This enables us to determine the duration taken for transmission and subsequent commitment.

Current and last transaction

To keep a watchful eye on the most recently committed transaction and the ongoing transaction by the coordinator in our server, direct your attention to the performance_schema.replication_applier_status_by_coordinator table for the most up-to-date status information.

 
mysql> select CHANNEL_NAME,THREAD_ID,SERVICE_STATE,LAST_PROCESSED_TRANSACTION,LAST_PROCESSED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,LAST_PROCESSED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP,PROCESSING_TRANSACTION from performance_schema.replication_applier_status_by_coordinator;
+----------------------------+-----------+---------------+-------------------------------------------+------------------------------------------------------+-------------------------------------------------------+-------------------------------------------+
| CHANNEL_NAME               | THREAD_ID | SERVICE_STATE | LAST_PROCESSED_TRANSACTION                | LAST_PROCESSED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP | LAST_PROCESSED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP | PROCESSING_TRANSACTION                    |
+----------------------------+-----------+---------------+-------------------------------------------+------------------------------------------------------+-------------------------------------------------------+-------------------------------------------+
| group_replication_applier  |       131 | ON            |                                           | 0000-00-00 00:00:00.000000                           | 0000-00-00 00:00:00.000000                            |                                           |
| group_replication_recovery |       142 | ON            | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:3475 | 2023-10-23 19:15:46.134228                           | 2023-10-23 19:15:46.138997                            | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:3476 |
+----------------------------+-----------+---------------+-------------------------------------------+------------------------------------------------------+-------------------------------------------------------+-------------------------------------------+
2 rows in set (0.00 sec)
	

By examining the PROCESSING_TRANSACTION, we can extract the GTID (Global Transaction ID) of the current transaction being processed by the coordinating thread. The LAST_PROCESSED_% columns furnish information about the most recently processed transaction, while the PROCESSING_TRANSACTION_% columns offer details about the ongoing transaction.

To gain insight into both the most recently committed transaction and the transaction currently being processed by the worker thread in our server, direct your focus to the performance_schema.replication_applier_status_by_worker table. This invaluable resource provides real-time status information for our monitoring needs.

 
mysql> select CHANNEL_NAME,WORKER_ID,THREAD_ID,SERVICE_STATE,LAST_APPLIED_TRANSACTION,APPLYING_TRANSACTION from performance_schema.replication_applier_status_by_worker;
+----------------------------+-----------+-----------+---------------+-------------------------------------------+-------------------------------------------+
| CHANNEL_NAME               | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_APPLIED_TRANSACTION                  | APPLYING_TRANSACTION                      |
+----------------------------+-----------+-----------+---------------+-------------------------------------------+-------------------------------------------+
| group_replication_applier  |         1 |       154 | ON            |                                           |                                           |
| group_replication_applier  |         2 |       155 | ON            |                                           |                                           |
| group_replication_applier  |         3 |       156 | ON            |                                           |                                           |
| group_replication_applier  |         4 |       157 | ON            |                                           |                                           |
| group_replication_recovery |         1 |       165 | ON            | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:4535 | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:4537 |
| group_replication_recovery |         2 |       166 | ON            | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:4532 | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:4536 |
| group_replication_recovery |         3 |       167 | ON            | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:4533 | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:4538 |
| group_replication_recovery |         4 |       168 | ON            | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:4534 | 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:4539 |
+----------------------------+-----------+-----------+---------------+-------------------------------------------+-------------------------------------------+
8 rows in set (0.00 sec)
	

By examining CHANNEL_NAME and SERVICE_STATE, we can provide the activity status of threads on the server. Additionally, LAST_ERROR_MESSAGE and APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE provide insights into any errors encountered when a thread halts due to issues, offering information on both the most recent error message and the current error.

Replication filter

Closely inspecting the performance_schema.replication_applier_global_filters table enables you to determine whether any replication filters have been configured for group_replication. In cases where no replication filter has been set, the query result will be empty, indicating an absence of filtration criteria.

 
mysql> select * from performance_schema.replication_applier_global_filters;
+-----------------+-------------+---------------------------+----------------------------+
| FILTER_NAME     | FILTER_RULE | CONFIGURED_BY             | ACTIVE_SINCE               |
+-----------------+-------------+---------------------------+----------------------------+
| REPLICATE_DO_DB | sbtest      | CHANGE_REPLICATION_FILTER | 2023-10-24 01:22:36.372820 |
+-----------------+-------------+---------------------------+----------------------------+
1 row in set (0.00 sec)
	

Members and status list

The performance_schema.replication_group_members table provides a comprehensive overview of the members within a MySQL Group Replication cluster, a robust solution for high availability and multi-master replication. This table serves as a vital tool for administrators, facilitating the monitoring and management of each member's status, and ensuring the continued health and synchronization of the entire replication group.

 
mysql> SELECT MEMBER_HOST,MEMBER_STATE,MEMBER_ROLE FROM performance_schema.replication_group_members;
+---------------+--------------+-------------+
| MEMBER_HOST   | MEMBER_STATE | MEMBER_ROLE |
+---------------+--------------+-------------+
| Node1         | ONLINE       | PRIMARY     |
| Node2         | ONLINE       | SECONDARY   |
+---------------+--------------+-------------+
2 rows in set (0.00 sec)
	

Replication lag

We have two effective methods for monitoring replication lag in our setup. The first is to inspect the replication lag directly using the SHOW SLAVE STATUS command on a specific node, providing real-time insights into the lag. Alternatively, we can utilize the performance_schema.replication_group_member_stats table to examine the replication queue, offering a more comprehensive view of the lag across the entire Group Replication cluster.

 
mysql>  show slave status for channel 'group_replication_recovery'\G
*************************** 1. row ***************************
               Slave_IO_State: Queueing source event to the relay log
                  Master_Host: Node1
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000011
          Read_Master_Log_Pos: 968694512
               Relay_Log_File: ip-172-31-53-9-relay-bin-group_replication_recovery.000003
                Relay_Log_Pos: 8212181
        Relay_Master_Log_File: binlog.000011
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 843587795
              Relay_Log_Space: 133319512
              Until_Condition: SQL_VIEW_ID
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 199
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: a0bf2766-71d4-11ee-816c-06e6380e502d
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Waiting for replica workers to process their queues
           Master_Retry_Count: 1
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:18240-18501
            Executed_Gtid_Set: 24bbde8d-6b6b-11ee-9c5f-06e65eab1927:1-18265
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: group_replication_recovery
           Master_TLS_Version: TLSv1.2,TLSv1.3
       Master_public_key_path:
        Get_master_public_key: 0
            Network_Namespace:
1 row in set, 1 warning (0.00 sec)
	

The Seconds_Behind_Master metric provides insight into the current delay or lag of the server in catching up with the cluster.

 
mysql> select MEMBER_ID,COUNT_TRANSACTIONS_IN_QUEUE from performance_schema.replication_group_member_stats;
+--------------------------------------+-----------------------------+
| MEMBER_ID                            | COUNT_TRANSACTIONS_IN_QUEUE |
+--------------------------------------+-----------------------------+
| a0bf2766-71d4-11ee-816c-06e6380e502d |                           0 |
| f48df98b-71d5-11ee-bf62-06e65eab1927 |                         311 |
+--------------------------------------+-----------------------------+
2 rows in set (0.00 sec)
	

The COUNT_TRANSACTIONS_IN_QUEUE metric offers valuable insight into the number of transactions currently in the queue, indicating the volume of transactions awaiting application on the server.

In this blog, we've laid the groundwork for monitoring group replication using the performance_schema tables. Our next installment will delve into advanced monitoring techniques with Grafana, providing a comprehensive view of your group replication setup.

implementing a robust monitoring strategy is essential for ensuring the health and performance of your MySQL Group Replication setup. By harnessing the power of the Performance Schema and other monitoring tools, you can proactively detect and resolve replication delays and inconsistencies, minimizing downtime and ensuring data integrity.

For a comprehensive exploration of recovery strategies and best practices, refer to our in-depth guide: Mastering MySQL Group Replication Recovery: Strategies for High Availability and Data Integrity. This resource offers valuable insights into strengthening your group replication setup for optimal performance and data protection.

Optimize your MySQL Group Replication setup with Mydbops' InnoDB Cluster Consulting Services. Our experts can help you streamline your database management and improve performance. Contact us today to discuss your specific needs.

{{cta}}

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.