Mastering Shard Removal in MongoDB: A Step-by-Step Guide

Mydbops
Oct 11, 2023
10
Mins to Read
All

Sharding is a fundamental technique in database management that empowers MongoDB users to achieve horizontal scaling, providing the means to handle massive datasets and high traffic loads. It's a powerful tool that enhances performance, scalability, and availability. Yet, there are instances when the art of removing a shard becomes a necessity.

In this comprehensive guide, we will delve into the intricate process of Shard Removal in MongoDB. We will explore the why, when, and how of this crucial operation, providing you with the knowledge and skills needed to navigate the removal of shards from your MongoDB cluster confidently. Let's embark on this journey to ensure the seamless transition of your database while safeguarding its stability and integrity.

Why does Shard need to be removed?

Sharding is a fundamental technique in database management that enables horizontal scaling, providing several notable advantages. At its core, sharding involves partitioning a large database into smaller, more manageable pieces called shards. Each shard is essentially a separate database that contains a subset of the overall data. This approach enhances the system's ability to handle growing datasets and high traffic loads.

However, there are situations where removing a shard becomes necessary. Below are the primary reasons for removing shards from a sharded cluster:

  • Reduced costs: While sharding brings advantages, it can also result in increased hardware and maintenance costs. Removing shards can help reduce these expenses, making it a cost-effective solution.
  • Improved performance: Sometimes, certain shards in a sharded MongoDB cluster become overloaded, adversely affecting performance. Removing these overloaded shards can lead to notable performance improvements for the remaining shards.
  • Increased availability: Removing shards can help to improve the availability of your cluster by reducing the number of points of failure.
  • Uneven Chunk Distribution: This occurs when chunks of data are distributed unevenly among the shards, potentially causing performance imbalances. Removing a shard and redistributing its data can rectify this issue, resulting in a more evenly distributed workload across the cluster.
  • Data Corruption: In instances of data corruption within a shard, shard removal is a viable method for rebalancing and recovering from the corruption, ensuring data integrity and system stability.

Pre-Removal Assessment

Consideration

Before embarking on the shard removal process, it is essential to carefully consider several key factors to ensure a smooth and secure transition. These considerations are crucial to minimizing downtime, maintaining data redundancy, and safeguarding the overall integrity of your MongoDB cluster:

  • Schedule maintenance: Plan the shard removal activity during non-production and it requires downtime based on the data size.
  • Balancer State: Ensure that the balancer is active and operational. You can check its state using the following commands:
 
sh.getBalancerState() 

sh.startBalancer() 

sh.getBalancerState() 
	
  • Backup:
    • For smaller datasets, consider taking a standard data backup.
    • For larger datasets, it is recommended to take a disk snapshot from one member of each shard and a configuration member.
    • Before proceeding with shard removal, it's advisable to remove one secondary node from the shard's replica set. This step helps ensure data redundancy and availability during the shard removal process.
      • If any unexpected issues arise during the shard removal activity, having a removed member as the primary allows for quick reversion and the recreation of the shard. This approach minimizes downtime and facilitates efficient recovery.
  • Monitor: Throughout the shard removal process, it is essential to continuously monitor the cluster's performance and availability. This proactive approach helps detect and address any issues promptly, ensuring a smoother and safer transition.

Validate Chunks Status

Before initiating the shard removal process, it is of paramount importance to verify that all data chunks are evenly and appropriately distributed across the cluster. This validation step is critical to ensuring the seamless execution of the shard removal process. If any issues arise during the migration of data chunks, they should be promptly identified, addressed, and resolved before progressing with the shard removal.

This validation step ensures that the data distribution is balanced and that there are no outstanding problems that might disrupt the shard removal. Addressing migration issues beforehand helps ensure a smoother and more successful shard removal activity.

Note: Make sure that the data on the removing shard is not needed for any active queries or applications.

By considering these factors, you can prepare for a smoother and more controlled shard removal process.

Shard Removal Process

To achieve the successful removal of a shard from a MongoDB cluster, it is absolutely crucial to ensure the smooth migration of the shard's data to the remaining shards within the cluster. This procedure offers a detailed, step-by-step roadmap for safely transferring data and executing the shard removal process.

Note: It should not be employed for migrating an entire cluster to new hardware. When migrating entire clusters, treat individual shards as independent replica sets and follow the appropriate migration procedures.

Determine the shard name to remove

To determine the name of the shard, connect to a mongos instance with the mongo shell and either:

 
mongos> db.adminCommand( { listShards: 1 } )
{
	"shards" : [
		{
			"_id" : "shardA",
			"host" : "shardA/172.31.44.90:27018",
			"state" : 1
		},
		{
			"_id" : "shardB",
			"host" : "shardB/172.31.42.216:27018",
			"state" : 1
		}
	],
	"ok" : 1,
	"operationTime" : Timestamp(1694276520, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694276520, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	

Data re-balancing

  • To initiate the removal of data chunks from the shard that you intend to remove, you should execute the removeShard command. This command initiates the process of draining the data from the shard, preparing it for removal.
 
db.adminCommand( { removeShard: "shardA" } )
	

Here is the output

 
mongos> db.adminCommand( { removeShard: 'shardA'})
{
	"msg" : "draining started successfully",
	"state" : "started",
	"shard" : "shardB",
	"note" : "you need to drop or movePrimary these databases",
	"dbsToMove" : [
		"Information"
	],
	"ok" : 1,
	"operationTime" : Timestamp(1694276582, 2),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694276582, 2),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	

The balancer begins migrating chunks from the shardA to other shardB in the cluster. These migrations happen slowly to avoid placing undue load on the overall cluster. Depending on the network capacity and the amount of data, this operation can take from a few minutes to several days to complete.

Check the status of the chunk migration

When monitoring the status of chunk migration, the output will provide essential details.

 
mongos> db.adminCommand( { removeShard: 'shardB'})
{
	"msg" : "draining ongoing",
	"state" : "ongoing",
	"remaining" : {
		"chunks" : NumberLong(515),
		"dbs" : NumberLong(1),
		"jumboChunks" : NumberLong(0)
	},
	"note" : "you need to drop or movePrimary these databases",
	"dbsToMove" : [
		"Information"
	],
	"ok" : 1,
	"operationTime" : Timestamp(1694276590, 5),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694276590, 5),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	
  • Remaining Chunks: This indicates the number of chunks that MongoDB still needs to migrate to other shards.
  • Primary Databases: The number of MongoDB databases that currently have a primary status on the shard being removed.
    • The dbsToMove array specifies the list of databases for which you should execute the movePrimary command. This is particularly relevant for unshared data on the primary shard.

In sh.status() command also the draining information is projected.

 
sh.status()
  shards:
        {  "_id" : "shardA",  "host" : "shardA/172.31.44.90:27018",  "state" : 1,  "draining" : true }
        {  "_id" : "shardB",  "host" : "shardB/172.31.42.216:27018",  "state" : 1 }
	

Continuously monitor the status of the removeShard command until the count of remaining chunks reaches zero. Only when this condition is met should you proceed to the next step in the shard removal process. This diligent monitoring ensures that the shard removal progresses smoothly and that all data is appropriately transferred before advancing to the finalization step.

Dealing with Jumbo Chunks

Note: For testing purposes, we have intentionally inserted a 5 lac document with the same shard key value to create a Jumbo chunk.

Once all data chunks, except for the jumbo chunks, have been successfully migrated, you may encounter the following error message in the primary log file of the config server.

 
2023-09-12T10:14:25.524+0000 I  SHARDING [Balancer] Performing a split because migration Information.movies: [{ released: -223573437414101916 }, { released: 0 }), from shardA, to shardB failed for size reasons :: caused by :: ChunkTooBig: Cannot move chunk: the maximum number of documents for a chunk is 119410, the maximum chunk size is 67108864, average document size is 1124. Found 535838 documents in chunk  ns: Information.movies { released: -223573437414101916 } -> { released: 0 }
	

For instance, if a jumbo chunk exists on the shard, the balancer is unable to transfer that chunk to another shard automatically. In such cases, the moveChunk command needs to be manually aborted.

 
mongos> db.adminCommand( { moveChunk : "Information.movies",
...  bounds : [{ "released" : NumberLong("-223573437414101916") }, { "released" : NumberLong("0") }] ,
...  to : 'shardB',
...  forceJumbo: true,
... })
{
	"ok" : 0,
	"errmsg" : "Cannot move chunk: the maximum number of documents for a chunk is 119410, the maximum chunk size is 67108864, average document size is 1124. Found 535838 documents in chunk  ns: Information.movies { released: -223573437414101916 } -> { released: 0 }",
	"code" : 153,
	"codeName" : "ChunkTooBig",
	"operationTime" : Timestamp(1694513849, 4),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694513849, 4),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	

Increase the chunk size

The default chunk size is 64 MB. But the above Jumbo chunk has approximately 564 MB. So temporarily, we increased the chunk size to 600 MB.

 
mongos> db.getSiblingDB("config").settings.updateOne({ _id: "chunksize" },{ $set: { _id: "chunksize", "value" : 600 }},{ upsert: true } )

{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
	

Migrate the chunk

After resizing the chunk size, we need to manually trigger the moveChunk command as follows,

 
mongos> db.adminCommand( { moveChunk : "Information.movies",
...   bounds : [{ "released" : NumberLong("-223573437414101916") }, { "released" : NumberLong("0") }] ,
...    to : 'shardB',
...    forceJumbo: true,
})
{
	"millis" : 94428,
	"ok" : 1,
	"operationTime" : Timestamp(1694514008, 4086),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694514008, 4086),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	

Decrease the chunk size to default

After the chunk is migrated the chunk size is reverted back to 64 MB.

 
mongos> db.getSiblingDB("config").settings.updateOne({ _id: "chunksize" },{ $set: { _id: "chunksize", "value" : 64 }},{ upsert: true } )

{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
	

Handling Unshared Data

If the shard is the primary shard for one or more databases in the cluster, then the shard will have unsharded data. If the shard is not the primary shard for any databases, skip to the next task, Finalize the Migration.

Understanding the Primary Shard

In a cluster, when you have a database with unsharded collections, that database stores its collections on a single shard, which is referred to as the primary shard for that database. This primary shard is responsible for housing all unsharded data within the cluster.

Identifying the Primary Shard

To determine whether the shard you are planning to remove serves as the primary shard for any of the cluster's databases, you can employ either of the following methods:

By utilizing these commands, you can access a document that contains a databases field. This field provides a comprehensive list of databases within the cluster, along with their respective primary shards. This information is instrumental in identifying the primary shard for each database, ensuring a clear understanding of the data distribution within your MongoDB cluster.

 
sh.status()

  databases:
        {  "_id" : "Information",  "primary" : "shardA",  "partitioned" : true,  "version" : {  "uuid" : UUID("e4999523-772d-4d21-af01-2c4f38efbdc2"),  "lastMod" : 1 } }
	

Note: After confirming that all the data chunks have been successfully migrated, it's important to proceed with the following step.

 
mongos> db.adminCommand( { removeShard: 'shardA'})
{
	"msg" : "draining ongoing",
	"state" : "ongoing",
	"remaining" : {
		"chunks" : NumberLong(0),
		"dbs" : NumberLong(1),
		"jumboChunks" : NumberLong(0)
	},
	"note" : "you need to drop or movePrimary these databases",
	"dbsToMove" : [
		"Information"
	],
	"ok" : 1,
	"operationTime" : Timestamp(1694279259, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694279259, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	

Migrate unsharded data

To move a database to another shard, use the movePrimary command.

 
db.adminCommand( { movePrimary: "", to: "destination Shard Name" })
	

Output:

 
mongos> db.adminCommand( { movePrimary: "Information", to: "shardB" })
{
	"ok" : 1,
	"operationTime" : Timestamp(1694279336, 1577),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694279336, 1577),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	

Finalize the Migration

To clean up all metadata information and finalize the removal, run removeShard again.

 
mongos> db.adminCommand( { removeShard: 'shardA'})
{
	"msg" : "removeshard completed successfully",
	"state" : "completed",
	"shard" : "shardB",
	"ok" : 1,
	"operationTime" : Timestamp(1694279387, 2),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694279387, 2),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	

Once the value of the stage field is completed, you may safely stop the processes comprising the shardA.

Validate

 
mongos> db.adminCommand( { listShards: 1 } )
{
	"shards" : [
		{
			"_id" : "shardB",
			"host" : "shardB/172.31.42.216:27018",
			"state" : 1
		}
	],
	"ok" : 1,
	"operationTime" : Timestamp(1694514147, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1694514147, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
	

Additional Considerations for Shard Removal

When removing shards from a MongoDB sharded cluster, there are some important additional points to keep in mind:

  • Empty Shard Requirement: Shards can only be removed if they are empty. If a shard contains any data, you must first migrate that data to another shard in the cluster.
  • Minimum One Shard Requirement: It's essential to retain at least one shard in the cluster. You cannot remove all of the shards; at least one must remain to ensure the cluster's functionality.
  • Temporary Performance Impact: The process of removing a shard can lead to temporary performance degradation while the activity is ongoing. Be prepared for potential slowdowns during the removal.
  • Balancer Process Time: The balancer process, responsible for redistributing data chunks on the remaining shards, may take some time to complete its task. This process ensures an even distribution of data.
  • Log Details: For troubleshooting and monitoring purposes during migration, all the relevant details are logged in the primary log file of the config server.
  • Shard Reintroduction: If you plan to reintroduce a previously removed shard to the sharded cluster, it's crucial to drop any associated databases, even if they are empty. This step is essential to ensure a clean reintegration of the shard and prevent potential conflicts or data integrity issues.

We trust that this blog has provided you with comprehensive insights into the shard removal process, including detailed action times.

Stay connected for more valuable MongoDB insights and best practices!

If you have any questions or need further assistance, please feel free to share them in the comments section below.

Also read: Troubleshooting MongoDB Shard Upgrades: Resolving Index Discrepancies

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.