MongoDB 7.0 Cluster-to-Cluster Sync: Simplifying Data Synchronization

Oct 16, 2023

Mins to Read

All

MongoDB 7.0 Cluster-to-Cluster Sync: Revolutionizing Data Synchronization

‍

In the fast-paced world of modern applications, where data availability and scalability are paramount, MongoDB continues to be the database of choice for developers and organizations worldwide. MongoDB 7.0 introduces a groundbreaking feature known as Cluster-to-Cluster Sync, designed to revolutionize the way data synchronization works in distributed environments.

Cluster-to-Cluster Sync

Cluster-to-Cluster Sync in MongoDB 7.0

Cluster-to-Cluster Sync is a game-changing feature in MongoDB 7.0 that empowers you to seamlessly synchronize data between two separate MongoDB clusters. This feature caters to various use cases, including disaster recovery, global data distribution, and smooth data migration. In this blog post, we'll explore the key aspects of Cluster-to-Cluster Sync and its practical applications.

Streamlining Data Synchronization and Migration with Mongosync

Data consistency between your production and UAT environments is vital for accurate testing, and Mongosync ensures this by keeping data in sync, minimizing discrepancies. Designed for efficiency, it automates data synchronization, reducing manual efforts and potential errors, ensuring data integrity. Whether it's for UAT syncing or platform shifts, Mongosync's adaptability makes it suitable for various data management needs, facilitating smooth transitions between platforms or environments

Data Migration

Data migration between MongoDB clusters is a common requirement, especially when upgrading to a new version or switching cloud providers. Cluster-to-Cluster Sync streamlines this process, ensuring that data consistency is maintained throughout the migration.

How Does Cluster-to-Cluster Sync Work?

Cluster-to-Cluster Sync leverages MongoDB's Change Streams, a feature that enables real-time monitoring of changes in a database. Here's a simplified overview of how it operates:

Change Stream Capture:

The source cluster captures real-time data changes using Change Streams. These changes encompass inserts, updates, and deletes.
Mongosync relies on change streams to sync data between source and destination clusters. It doesn't access the oplog directly but depends on change streams. If change streams return events from the past, they must still be within the oplog's time range.
Mongosync takes the operations from the source cluster's oplog and applies them to the destination cluster's data. If Mongosync misses applying some operations and they get removed from the source cluster's oplog, the sync fails, and Mongosync stops.

Change Stream Transmission: The captured changes are securely transmitted to the target cluster via encrypted connections. MongoDB 7.0 incorporates robust security features to preserve data integrity during transmission.
Data Application: On the target cluster, the received changes are applied to the corresponding data collections, ensuring data synchronization between the clusters.
Conflict Resolution: MongoDB provides conflict resolution mechanisms for cases like concurrent updates to the same document on both clusters, guaranteeing data consistency.

Benefits of Cluster-to-Cluster Sync

Cluster-to-Cluster Sync offers numerous benefits, including:

High Availability: Data synchronization enables seamless failover to a secondary cluster in the event of a primary cluster failure, ensuring high availability.
Global Reach: Distribute data globally to reduce latency for users in different regions, enhancing the overall user experience.
Data Mobility: Easily migrate data when upgrading MongoDB versions or transitioning to a new cloud provider.
Consistency: MongoDB's conflict resolution mechanisms maintain data consistency, even in scenarios involving concurrent updates and conflicts.
Security: MongoDB 7.0 provides robust security features, safeguarding data confidentiality and integrity during synchronization.

Getting Started with Cluster-to-Cluster Sync

To get started with Cluster-to-Cluster Sync in MongoDB 7.0, follow these high-level steps:

Upgrade to MongoDB 7.0: Ensure you are using MongoDB 7.0 or a compatible version that supports Cluster-to-Cluster Sync.
Set Up Source and Target Clusters: Configure the source and target MongoDB clusters based on your use case. Ensure network connectivity and appropriate access controls.
Enable Change Streams: Configure Change Streams on the source cluster to capture changes.
Configure Sync Settings: Define synchronization rules, such as which data collections to sync and the frequency of synchronization.
Monitor and Maintain: Continuously monitor the synchronization process and maintain the clusters to ensure data consistency.

Practical presentation

In this, we will delve into the process of configuring cluster synchronization between two distinct replica sets.

Cluster-to-cluster sync provides synchronization between

Here I’m showcasing the configuration steps where we can initiate mongosync with two self manages cluster with at least with MongoDB version 6.0 or later.

Prerequisites

Before setting up Cluster-to-Cluster Sync, there are some prerequisites to keep in mind:

MongoDB Version: Ensure that both the source and destination clusters are running at least MongoDB 6.0 or later.
Server Version Compatibility: The source and destination clusters should be running the same MongoDB server version.
Feature Compatibility: Both clusters should have a Feature Compatibility Version of at least 6.0, and they should be set to the same Feature Compatibility Version.

Installation

To get started, you need to install the Cluster-to-Cluster Sync tool. Here are the steps:

Download the Cluster-to-Cluster Sync tool as a .tgz tarball from the official MongoDB website: Download Mongosync.
Follow the installation instructions provided in the official MongoDB documentation for your specific platform: Mongosync Installation Guide

Setting Up User Permissions

To connect two clusters with Mongosync, you must create a database user with the appropriate permissions in both clusters. The user specified in the Mongosync connection string should have the required permissions on both the source and destination clusters. The permissions may vary depending on your environment and whether you intend to perform write-blocking or reverse synchronization.

Note: Mongosync syncs collection data between clusters but does not synchronize users or roles. Therefore, you can create users with different access permissions on each cluster.

To determine the correct user permissions for your use case, see User Permissions.

Configuring Mongosync

Once you have installed Mongosync on a server, you need to set up the connection strings for each replica set. Specify each node in the replica set in the connection string.

 
"mongodb://mydbops:vel123@172.31.47.72:27017,172.31.47.72:27018"
"mongodb://mydbops:vel123@172.31.38.84:27017,172.31.38.84:27018"

After validating the connection URIs for both clusters, you can set up the source and destination clusters for data synchronization using Mongosync.

Initialization

In the initialization process of Mongosync, we will explore two methods for setting it up and getting it running. These methods provide flexibility and cater to various preferences and use cases. Let's delve into each of these methods in more detail.

Method 1 (Command-Line Initialization)

 
mongosync \
  	--cluster0 "mongodb://mydbops:vel123@172.31.47.72:27017,172.31.47.72:27018" \
  	--cluster1 "mongodb://mydbops:vel123@172.31.38.84:27017,172.31.38.84:27018"

In this command, Cluster0 refers to the source replica set, and Cluster1 represents the destination server.

Method 2 (Configuration File Initialization)

You can also set up the initialization of Mongosync using a YAML config file.

I have utilized a configuration file setup method to configure Mongosync.

The configuration file should resemble the following:

 
vi /etc/mongosync.conf

cluster0: "mongodb://192.0.2.10:27017"
cluster1: "mongodb://192.0.2.20:27017"
logPath: "/var/log/mongosync"

Here are the configuration file options that can be tailored to suit our specific use case.

‍

Upon executing the command, Mongosync will initialize and connect to the source and destination clusters. Initially, it will be in an IDLE state.

After configuring the Mongosync config file, you can initialize it with the following command:

 
mongosync --config mongosync.conf

We can check whether Mongosync has been properly initialized by inspecting the log file. Use the tail command to view the Mongosync log:

 
 tail -10 /var/log/mongosync/mongosync.log

{"level":"info","serverID":"320af6fb","mongosyncID":"coordinator","port":27182,"time":"2023-09-10T15:55:15.123837512Z","message":"Running webserver."}

Look for confirmation messages indicating that Mongosync is running and in the IDLE state.

Mongosync Endpoints in MongoDB 7.0 — Mongosync Endpoints

Starting Data Synchronization

To initiate data synchronization, you need to start Mongosync.

Use the following curl command:

 
curl localhost:27182/api/v1/start -X POST \
  	--data '{ "source": "cluster0", "destination": "cluster1" }'

If the POST command is not available, you may need to install it as a dependency.

At this stage, Mongosync will begin syncing data from the source replica set to the destination cluster.

During the initial sync, Mongosync can be slower because it's busy copying documents concurrently, causing some load. However, once the initial sync is done, it speeds up and stays close to real-time changes on the source cluster.

Check the progress of Mongosync

Issue the below command where mongosync is running to check the current state of mongosync.

 
curl localhost:27182/api/v1/progress  -XGET
{
  _id: ObjectId("64fdf5d92bb050f2a536c71a"),
  progress: {
	state: 'RUNNING',
	canCommit: true,
	canWrite: false,
	info: 'change event application',
	lagTimeSeconds: 0,
	collectionCopy: { estimatedTotalBytes: 359, estimatedCopiedBytes: 359 },
	directionMapping: {
  	Source: 'cluster0: 172.31.47.72:27017,172.31.47.72:27018',
  	Destination: 'cluster1: 172.31.38.84:27017,172.31.38.84:27018'
	},
	mongosyncID: 'coordinator',
	coordinatorID: 'coordinator'
  }
}

Tracking the progress of your Mongosync operation is easy using the state feature. For more detailed progress information check Mongosync Progress Details.

Pausing the Mongosync

Sometimes, we may need to put your Mongosync operation on hold. To do that, we can issue a simple command.

 
curl localhost:27182/api/v1/pause -XPOST --data '{ }'

{"success":true}

If the pause request is successful, Mongosync enters the PAUSED state.

Planning for Extended Pauses: If you anticipate pausing synchronization for an extended period, it's a good idea to increase the size of the replica set oplog in the source cluster. This helps ensure that you won't run into oplog space issues during prolonged pauses.

Resuming Mongosync

When you're ready to kickstart a paused Mongosync session and continue the synchronization process, here's what you need to do,

Resume Synchronization: To resume a paused synchronization session using data stored on the destination cluster, issue the following command

 
curl localhost:27182/api/v1/resume -XPOST --data '{ }'

{"success":true}

If the resume request is successful, Mongosync transitions into the RUNNING state, ensuring your data synchronization is back on track.

Ensuring a Smooth Synchronization Commit

As we are going to end our data synchronization journey from one cluster to another, it becomes crucial to commit the Mongosync operation. Committing Mongosync serves as the final seal, ensuring that all changes are accurately and definitively applied to the destination cluster. This pivotal step is integral to completing the synchronization process successfully.

Check Synchronization Status: Use the progress endpoint to verify that the synchronization is ready for commitment. Send the following request

 
curl localhost:27182/api/v1/progress  -XGET


{
  _id: ObjectId("64fdf5d92bb050f2a536c71a"),
  progress: {
	state: 'RUNNING',
	canCommit: true,
	canWrite: false,
	info: 'change event application',
	lagTimeSeconds: 0,
	collectionCopy: { estimatedTotalBytes: 359, estimatedCopiedBytes: 359 },
	directionMapping: {
  	Source: 'cluster0: 172.31.47.72:27017,172.31.47.72:27018',
  	Destination: 'cluster1: 172.31.38.84:27017,172.31.38.84:27018'
	},
	mongosyncID: 'coordinator',
	coordinatorID: 'coordinator'
  }
}

Review the Response: The response will provide important details about the synchronization progress, including whether it's ready for a commit. Look for canCommit: true in the response, which indicates that the commit request can proceed successfully. Once you've confirmed that everything is set, you can proceed with sending the commit request.
Send the Commit Request: Use the following command to send a request to the commit endpoint:

 
curl localhost:27182/api/v1/commit -XPOST --data '{ }'

{"success":true}

Review the Response: If the commit request is successful, Mongosync transitions into the COMMITTING state and then automatically moves to the COMMITTED state.

In this blog post, we've walked through the process of configuring cluster synchronization between two distinct replica sets using MongoDB 7.0's Cluster-to-Cluster Sync feature. This powerful feature opens up possibilities for disaster recovery, global data distribution, and data migration.

Our journey doesn't end here! Stay tuned for future blog posts where we'll dive even deeper. We'll cover topics like reverse sync, how to make the most of Mongosync with sharding, and discover the full potential of MongoDB 7.0 for your production needs.

Exciting things are on the horizon, and we can't wait to share them with you.

Stay connected for more valuable MongoDB insights

Also read: Troubleshooting Slow Mongosh in MongoDB: A Guide to Swift Resolution