How to Troubleshoot a MySQL Replica IO Thread that is Stuck in a Connecting State

Mydbops
May 15, 2023
10
Mins to Read
All

Discover how to troubleshoot a MySQL replica IO thread stuck in a connecting state. Learn about the replication architecture, security group rules for AWS EC2 instances, and how to address common issues like network restrictions and bind address configuration.

MySQL is a powerful database management and a widely used cloud database service. One of its key features is the ability to create replicas of a master database to improve its availability and scalability. However, at times the IO thread in a MySQL replica may get stuck in a connecting state, which can cause replication issues and affect the overall data consistency of the replicas.

  1. Replication Architecture and the Connecting State
  2. Configuring Security Group Rules for AWS EC2 Instances
    1. Inbound rule of the source server
    2. Outbound rule of the replica server
  3. Restricting TCP/IP Connections in MySQL Using Bind Address:

Recently, we were attempting to set up replication between two MySQL servers in AWS EC2, but the replica IO thread remained in the connecting state.

Replication Architecture and the Connecting State

MySQL replication operates on the pull concept. When the replication thread on the replica server connects to the source server, it pulls the data from the source server. The replica IO thread often remains in the connecting state due to connectivity or permission issues.

Configuring Security Group Rules for AWS EC2 Instances

We have used AWS EC2 servers, so I made sure that the source and replica could communicate through their incoming and outgoing ports. The server’s incoming and outgoing connections are controlled by inbound and outbound rules.

Since replication architecture relies on the pull concept, the replica server must whitelist the source server IP in its outbound rule, and the source server must whitelist the replica server IP in its inbound rule in the security group.

The security group’s inbound rule for the source server accepts all connections through the MySQL default port 3306, and the replica server’s outbound rule is open.

Inbound rule of the source server

Outbound rule of the replica server

Despite the port being available, MySQL was unable to connect to the remote server. Replication was still just in the connecting condition.

 
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: ********
                  Master_User: ****
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: ip-*-*-*-*-bin.000003
          Read_Master_Log_Pos: 796
               Relay_Log_File: ip-*-*-*-*-bin.000002
                Relay_Log_Pos: 907
        Relay_Master_Log_File: ip-*-*-*-*-bin.000003
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 796
              Relay_Log_Space: 1125
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error reconnecting to master 'replica@*.*.*.*:3306' - retry-time: 60  retries: 1
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID:
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 230214 16:50:42
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)
	

We realized that in addition to network restrictions, the bind_address system variable in MySQL was also restricting connectivity. The bind_address variable determines how the server listens for TCP/IP connections. We found that the bind_address variable was set to 127.0.0.1, which mapped to the local address on the source server, declining connections from remote hosts.

 
mysql> show global variables like 'bind_address';
+---------------+-----------+
| Variable_name | Value     |
+---------------+-----------+
| bind_address  | 127.0.0.1 |
+---------------+-----------+
1 row in set (0.001 sec)
	

Restricting TCP/IP Connections in MySQL Using Bind Address:

The bind_address system variable is an essential aspect of MySQL connectivity, and it should be set correctly to ensure successful replication. We set the bind_address variable to 0.0.0.0 to allow the server to accept TCP/IP connections from all server host IPv4 interfaces.

 
mysql> show global variables like 'bind_address';
+---------------+---------+
| Variable_name | Value   |
+---------------+---------+
| bind_address  | 0.0.0.0 |
+---------------+---------+
1 row in set (0.002 sec)
	

After the bind address adjustment on the source server, it began accepting remote connections, and replication was successful.

 
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: *.*.*.*
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: ip-*-*-*-*-bin.000005
          Read_Master_Log_Pos: 352
               Relay_Log_File: ip-*-*-*-*-relay-bin.000005
                Relay_Log_Pos: 559
        Relay_Master_Log_File: ip-*-*-*-*-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 352
              Relay_Log_Space: 1182
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID:
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)
	

It’s essential to keep in mind that issues with the replica IO thread being stuck in the connecting state may not always be due to network-level limitations. It’s important to also examine the variables at the MySQL level to identify any potential configuration issues. By doing so, we can identify and resolve the root cause of the problem and get the replication up and running smoothly.

Feel free to browse our website for more informative blog posts on topics related to database consulting, support, and other technology-related issues. Our blog may offer valuable insights and advice for your business or career, so you may discover something that piques your interest. Additionally, to gain further insights, you can refer to Mastering MySQL Group Replication Primary Promotion Techniques.

If you require expert assistance with managing your database, do not hesitate to contact us today. Our team of skilled professionals can provide customized solutions to meet your specific needs, ensuring the security, optimization, and accessibility of your data at all times. Reach out to us now to learn more about how we can assist you in streamlining your operations and maximizing the value of your MySQL Performance and Operations.

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.