Unexpected IO Spikes in MongoDB: Diagnosing and Resolving TTL Index Issues During Standalone to Replica Conversions

Mydbops
May 23, 2024
10
Mins to Read
All

In our recent small project to convert a standalone MongoDB setup into a replica set, a process we've successfully completed several times for various clients, we faced an unexpected challenge. After enabling the replica set, the database node experienced significant increases in CPU usage and disk I/O activity.

This unforeseen issue prompted a thorough investigation to understand its underlying causes and suggest effective solutions. Drawing upon my experience in MongoDB administration and troubleshooting, I'll walk you through the steps I took to identify and address the issue. Finally, we'll discover that a surprising factor contributed to this challenge.

Unforeseen Hiccups: A Long-Past Data Migration Mishap

During a data migration or node replacement operation,

  • The process was initiated by enabling the replica set and efficiently synchronizing data across the cluster.
  • Subsequently, the old node was removed and the newly added member was inadvertently reverted back to a standalone instance by the activity team.
  • However, in the final stage, a crucial step was overlooked: the removal of the local database from the standalone instance.

As a result of this oversight, unforeseen complications arose.

Anatomy on local DB

  • Each mongod instance maintains its own local database, which serves as a repository for data utilized in the replication process and other instance-specific data.
  • Notably, the local database remains invisible to replication, meaning collections within it are not replicated.

Below are the considerable collection details in the local database:

Collection Name

Details

local.system.replset

  • Holds the replica set's configuration object as its single document.

  • View configuration information using rs.conf() in Mongosh or query the collection.

local.oplog.rs

  • The capped collection that stores the oplog. Size is set using oplogSizeMB at creation.

  • Resize using the Change the Size of the Oplog procedure.

local.replset.minvalid

  • Contains an object used internally by replica sets to track replication status.

local.startup_log

  • A capped collection where each mongod instance inserts a document on startup with diagnostic information about the instance and host. 

Note: The collections local.oplog.rs and local.system.replset  along with other system collections are prohibited from being dropped within the local database.

Errors:

 
local> db.getCollection('system.replset').drop()

MongoServerError[IllegalOperation]: can't drop system collection local.system.replset

local> db.getCollection('oplog.rs').drop()

MongoServerError[Location5255001]: can't drop oplog on storage engines that support replSetResizeOplog command
	

Standalone to Replica Set Transition

Recap of Steps

  • In our current scenario, it was observed that the node discussed earlier was undergoing a transition to become a replica set.
  • During pre-validation, the existence of the local database was noted.
  • Being aware that retaining the old oplog within the database while transitioning to a replica set would result in the oplog being reapplied, the decision was made to drop the local database and commence the mongod as a replica set.

Issue Arises

  • Subsequently, a significant increase in database load and aggressive saturation of disk writes I/O was encountered.
Standalone to Replica Set Transition Spike in CPU Usage
Standalone to Replica Set Transition Spike in Cache Usage
Standalone to Replica Set Transition Spike in Disc Usage
  • To address this issue, the mongod process was temporarily halted and the node was reverted to a standalone configuration, resulting in the issue being resolved.
  • Upon investigation of the new oplog, a large volume of delete operation entries was observed. However, examination of relevant metrics and monitoring data revealed no corresponding information regarding deletions in the mongod logs or at the opcounter level.

Implications of Disabled TTL and Warning Messages

  • The TTL is disabled due to the presence of the system.replset collection in the local db if the MongoDB deployment is standalone. A warning message will exist in the startup warning for the mongo shell.
 
Related Startup Warning :

2024-04-09T12:45:09.536+00:00: Document(s) exist in 'system.replset', but started without --replSet. Database contents may appear inconsistent with the writes that were visible when this node was running as part of a replica set. Restart with --replSet unless you are doing maintenance and no other clients are connected. The TTL collection monitor will not start because of this. For more info see http://dochub.mongodb.org/core/ttlcollections
	
Standalone MongoDB to Replica Set

FYI: In this case, even though we validated the ttlMonitorEnabled, it will appear as enabled, but the actual behavior is restricted.

 
test> db.adminCommand({ getParameter:1, ttlMonitorEnabled: 1 }){ ttlMonitorEnabled: true, ok: 1 }
	

Mitigating Obstacles

  • To overcome this hurdle in the transition phase, the best approach is to purge the data that matches the TTL condition in the respective collections.
  • First, identify the collections that have TTL indexes. Following any purging strategy, proceed to delete the data accordingly.
  • Based on the data size of each collection, prioritize the purging approach.
  • Referencing the purging documents, set a batch size and formulate the query for the purging process.
  • Once all the old data is removed, cross-verify and then drop the local database after disabling authentication in the standalone deployment.
  • Subsequently, observe the database health and resource utilization before performing the replica set transition.

Through meticulous analysis and collaboration with my team, we were able to identify the root cause of the problem and implement preventative measures to safeguard against similar incidents in the future. This experience underscored the importance of proactive planning and vigilant monitoring in ensuring the stability and integrity of MongoDB deployments.

Migrate Your MongoDB Without Resource Woes! Contact Mydbops today to leverage our MongoDB Managed Services and Consulting expertise. We'll help you avoid resource saturation and ensure a successful transition to a replica set for your remote databases.

{{cta}}

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.