It was a clear day and traffic started to increase as we are in the biggest sale event of the year for one of our Managed services clients.
This is one of the interesting troubleshooting let’s dive in. Let me provide some background. We were running the Percona version of MySQL (5.7.30), on a 96 core VM with a 512GB RAM dedicated DB node. This is the biggest server we get in the self-hosted data center.
We started checking the performance graphs, and we use PMM for metric analysis. Meanwhile, we started getting calls out from the application team on the latency and slowness
Below is the CPU utilization graph for the overall issue period, We will discuss what caused the sudden spike of the utilization ( Around 10:50 ) and the variable that was applied to fix (dip in the graph)
Usually, it would be the ‘USER’ CPU would be high due to a bad query or due to high concurrency but here it’s the other way around the system CPU is used as high as 60%.
We wanted to see what is causing this, so we ran a stack trace on a simulated system by using percona toolkit’s pt-pmp is a read-only tool that helps in collecting GDB stack-traces for the attached program and printing stack traces for all threads
Caution: Run this tool in production with absolute care since it can make MySQL be unresponsive(freeze) for some period of time ranging from seconds to longer based on traffic pattern
Sorry for the image, pls zoom in to have a closer look
As highlighted in the above image we can see “__tz_convert,my_system_gmt_sec,Time_zone_system::TIME_to_gmt_sec” calls being struck which are clearly related to time zone conversion.
What’s really happening?
To know more we had a touch base with the dev Team, This is one of the exceptional tracking applications which is heavily dependent on the timestamp column for most of its operations with columns such as “created_at, updated_at…”
MySQL is dependent on OS for time zone conversion ie., with time_zone variable set to ‘SYSTEM’ by default, every time a row is being inserted or updated with timestamp MySQL obtains the OS session time_zone and then makes a subsequent OS API call to convert the corresponding conversion function Time_zone_system::TIME_to_gmt_sec,
In our case, this OS call is causing the high system CPU utilization as high as 60%, since this is being called at high concurrency. Thus causing the low-cost queries to be latent.
How did we fix this issue?
Now let’s see how we nailed the problem, We made the MySQL handle time zone conversion on its own by changing the “time_zone” parameter in MySQL from default ‘SYSTEM’ to “+05:30”(IST) this is a dynamic change with MySQL and it doesn’t require of mysqld process
Global setting:
mysql> set global time_zone=’+05:30′;
Query OK, 0 rows affected (0.00 sec)
Config file:
[mysqld]
default-time-zone = “+05:30”
After applying the changes we made a ‘connection refresh’ by doing a rolling restart for the application, And we started witnessing an instantaneous dip (around 13:50 ) in system CPU utilization, sharing the image again
With the managed DBaaS like AWS RDS, you have the provision to choose the named time zone in the parameter group as below.
We can also load the named timezone like “Asia/Calcutta” with self-hosted MySQL by converting and loading zone info with “mysql_tzinfo_to_sql” utility which comes by default as below
Key takeaway:
With Modern heavily timestamp dependent applications would be heavily affected with the default setting of time_zone but would be unrecognized in most cases, we would advise having this variable set with MySQL for better performance.
Happy Troubleshooting !!