Optimize MongoDB Storage: Compression, Indexing, and TTL Best Practices

Mydbops

Jan 17, 2025

Mins to Read

All

‍

In any database system, managing storage costs efficiently is crucial, especially for growing data-intensive applications. MongoDB, being a flexible and scalable NoSQL database, offers several features that can help you reduce storage costs while maintaining performance. In this blog, we will explore three key strategies for minimizing storage costs in MongoDB: compression, indexing, and TTL (Time-to-Live) indexes.

Compression: Optimize Data Storage

The Advantages of Compression in MongoDB

Compression reduces the size of your data on disk, which not only lowers storage costs but can also improve query performance.

MongoDB offers several compression methods for both block storage (data written to disk) and network transmission. Choosing the right compression option depends on your specific use case, whether your priority is high throughput or maximum storage savings.

Types of compression methods

‍Snappy Compression

Introduced in: MongoDB 3.4
Description: Snappy is the default compression algorithm for MongoDB. It is optimised for speed, offering a balance between compression efficiency and low CPU overhead, making it suitable for real-time applications that demand high throughput and Snappy is the default compression in MongoDB.
Use Case: Ideal for applications where performance is critical, and storage savings are not the primary concern.

Zlib Compression

Introduced in: MongoDB 3.6
Description: Zlib offers higher compression ratios than Snappy but at the cost of more CPU resources. It's akin to gzip compression and is best suited for scenarios where storage savings are more important than read/write performance.
Use Case: Use Zlib for applications that prioritize strong compression ratios over speed, making it suitable for archival storage (not frequently accessed but needs to be retained for long periods) and scenarios where bandwidth is limited. It offers good compression efficiency but may introduce higher CPU overhead compared to Snappy and Zstd.

Zstd Compression

Introduced in: MongoDB 4.2
Description: Zstd provides a balance between high compression ratios and moderate CPU usage. It’s highly customizable, allowing you to tune compression levels (from 1 to 22) based on your needs.
Use Case: Best for applications where saving storage space is important, but with manageable CPU overhead.

‍ Measuring Compression Effectiveness

The most effective way to measure the impact of compression is to load the same dataset with one compression method enabled, then compare the storage statistics with another compression method using the same dataset afterward. You can utilize the db.stats() command to view both the data size and index size, allowing for a direct comparison of the storage statistics.

This script will print the total compressed size (including index size) and the uncompressed size for each database and as an overall total for all databases.

var totalCompressedSize = 0;
var totalUncompressedSize = 0;
db.adminCommand({ listDatabases: 1 }).databases.forEach(function(database) {
    
var dbName = database.name;
    var dbStats = db.getSiblingDB(dbName).stats(1024*1024*1024); // Convert to GB
    // Compressed size (storageSize + indexSize)
    var compressedSize = dbStats.storageSize + dbStats.indexSize;
    totalCompressedSize += compressedSize;
    // Uncompressed size (dataSize only)
    var uncompressedSize = dbStats.dataSize;
    totalUncompressedSize += uncompressedSize;
    print("Database: " + dbName + ", Compressed Size: " + compressedSize.toFixed(2) + " GB, Uncompressed Size: " + uncompressedSize.toFixed(2) + " GB");
});

print("\nTotal Compressed Size of all DBs: " + totalCompressedSize.toFixed(2) + " GB");
print("Total Uncompressed Size of all DBs: " + totalUncompressedSize.toFixed(2) + " GB");


Sample Output : 
—------------—
Database: admin, Compressed Size: 0.00 GB, Uncompressed Size: 0.01 GB
Database: config, Compressed Size: 0.00 GB, Uncompressed Size: 0.00 GB
Database: local, Compressed Size: 4.72 GB, Uncompressed Size: 9.52 GB
Database: shop, Compressed Size: 11.69 GB, Uncompressed Size: 33.85 GB
Database: users, Compressed Size: 105.72 GB, Uncompressed Size: 275.05 GB

Total Compressed Size of all DBs: 122.12 GB
Total Uncompressed Size of all DBs: 318.42 GB

‍How to Enable Compression

MongoDB allows you to set the compression type when creating a collection or through the configuration file for database-wide compression.

‍Enable Compression for a Specific Collection:

You can specify the compression type when creating a collection:

db.createCollection('myCollection', {
  storageEngine: {
    wiredTiger: {
      configString: 'block_compressor=zstd'
    }
  }});

Note: Collection-wise compression options in MongoDB were introduced with the WiredTiger storage engine. By default, WiredTiger uses Snappy block compression for collection data, with zlib and zstd compression also available. The WiredTiger storage engine was introduced in MongoDB 3.0.

For more details, refer to MongoDB Documentation

Enable Compression in the MongoDB Config File (mongod.conf):

You can configure compression for the entire database using the mongod.conf file. Here’s how to enable Zstd compression:

storage:
  wiredTiger:
    collectionConfig:
      blockCompressor: zstd

Once configured, MongoDB will apply the chosen compression method to all collections within the database.

Benefits of Compression

Lower Storage Costs: Compressed data uses less disk space, which translates into lower cloud storage or hardware costs.
Improved Query Performance: Smaller datasets mean less data to read from disk, improving I/O efficiency and reducing query times.

‍Choosing the Right Compression: Guidelines for Your Application Needs

Zlib:

High CPU Usage: Zlib provides the best compression, but it’s CPU-intensive, which can slow down both write and read operations.
Good for Archival Data: Suitable for data that doesn't need frequent access due to its high CPU requirements.

Snappy:

Lower Compression Ratio: Snappy is faster and uses less CPU but offers lower storage savings compared to Zlib and Zstd.
Best for Speed: Ideal for applications where speed matters more than saving storage space.

Zstandard (Zstd):

Moderate CPU Usage: Zstd balances compression ratio and speed but uses more CPU than Snappy, especially at higher compression levels.
Customizable: It can be configured for either better compression or faster performance, but higher compression can increase CPU load.

Efficient Indexing: Balancing Performance and Cost

Indexes play a critical role in improving query performance, but they can also consume significant storage space and Memory consumption. Understanding which indexes to create and which to avoid is essential for reducing storage costs.

Avoid Over-Indexing

While indexes significantly speed up queries, over-indexing creating more indexes than necessary can lead to excessive storage consumption. MongoDB allows up to 64 indexes per collection by default, and it's crucial to carefully manage which indexes are created to avoid hitting this limit and unnecessary storage costs.

Use Compound Indexes

To avoid creating multiple single-field indexes, consider using compound indexes. Compound indexes can handle queries involving multiple fields, reducing the number of individual indexes required.

For example:

db.collection.createIndex({ field1: 1, field2: 1 });

This index serves queries involving both field1 and field2, optimizing performance while reducing the overhead of maintaining separate indexes. When designing compound indexes, applying the ESR rule (Equality, Sort, Range) can help ensure optimal performance.

Tip: Before MongoDB 4.2, when creating an index, it was important to use the background: true option to ensure that the index build occurred in the background, preventing it from blocking read and write operations on the collection. Starting from MongoDB 4.2, an optimized index build process is used, making the background: true option obsolete, as all index builds now avoid blocking collection operations.

For more details, refer to MongoDB Documentation

‍Partial and Sparse Indexes

To further minimize index size and reduce storage costs, consider using partial or sparse indexes:

Partial Indexes: These indexes only include documents that meet a specified filter condition, reducing the size of the index.

Sparse Indexes: These indexes only include documents where the indexed field exists, excluding documents with missing fields.

For example, to create a sparse index:

db.collection.createIndex({ field1: 1 }, { sparse: true });

This strategy helps minimize storage usage by ensuring only relevant documents are indexed.

Monitor and Optimize Indexes

_MongoDB provides tools to monitor index usage and identify unused or underutilized indexes. By periodically reviewing index performance, you can decide whether to keep or drop an index.

Use the following command to check index usage (Unused index) statistics:

db.<collection_name>.aggregate([{ $indexStats: {} }]).pretty();

If an index has not been used for a long period, it may be prudent to drop it to free up storage space and optimize memory usage. You can remove an index using:

db.collection.dropIndex({ field: 1 });

Duplicate Indexes

Duplicate indexes in MongoDB occur when multiple indexes are created on the same fields, often with identical or similar configurations. These can arise unintentionally during schema changes, migrations, or when different team members independently create indexes. Duplicate indexes are generally undesirable because they increase storage overhead and can degrade performance due to the extra maintenance required during write operations.

Dropping Duplicate Indexes

Duplicate indexes, such as a single-field index and a compound index that includes the same field, can often be removed to save storage. For example:

Index 1: { name: 1 }
Index 2: { name: 1, age: 1 }

If the compound index { name: 1, age: 1 } is sufficient for your queries, you can drop the single-field index { name: 1 } to reduce storage overhead and optimize memory usage.

However, before dropping a duplicate index, consider the following:

Unique Index: Ensure the index isn’t enforcing uniqueness on the field, which could impact data integrity.
Collation: Check if the index has a specific collation, as this could affect the collation query results.
TTL Index: If the index is used for TTL (Time-To-Live) expiration, removing it may impact the automatic removal of old documents.

Efficient indexing in MongoDB requires a careful balance between performance and storage costs. By avoiding over-indexing, using compound indexes, and regularly reviewing and dropping unused or duplicate indexes, you can optimize both query performance and storage usage in your database environment.

We have detailed blogs on indexing, including Blog 1, Blog 2, and Blog 3. Take a moment to explore these resources for valuable insights.

‍TTL (Time-To-Live) Indexes: Automated Data Expiry

Time-to-Live (TTL) indexes allow MongoDB to automatically remove documents from a collection after a set period. This is particularly useful for managing storage costs by automatically clearing out outdated or obsolete data, eliminating the need for manual deletion. A TTL index is applied to a field containing a date or timestamp, and MongoDB will automatically delete any document where the indexed field exceeds the specified expiration duration.

Benefits of TTL Indexes

Automated Data Management: TTL indexes reduce the need for manual data deletion scripts, simplifying data lifecycle management.

Cost Savings: By automatically removing outdated data, TTL indexes help control storage costs, especially for collections where data naturally expires (e.g., session logs, temporary data).

We've covered the topic in depth in our blog, The Ultimate Guide to MongoDB TTL Indexes. Check it out for a comprehensive understanding of TTL indexes and how to effectively use them in your applications.

Final Thoughts

Reducing MongoDB storage costs involves a combination of thoughtful compression settings, efficient indexing strategies and automated data lifecycle management with TTL indexes. By leveraging these features, you can optimize both performance and storage, ensuring that your MongoDB deployment remains cost-effective while meeting your application’s demands.

Utilize Snappy or Zstd compression to decrease disk usage and enhance performance, choosing the appropriate compression type based on your specific use cases.
Avoid over-indexing by creating compound indexes. Implement partial or sparse indexing strategies, and regularly monitor and drop unused or duplicate indexes to save storage and memory.
Utilize TTL indexes to automatically expire and delete outdated data.

By following these strategies, you can effectively manage MongoDB storage costs while maintaining a high-performance database.

Improve your MongoDB performance and reduce costs with Mydbops MongoDB' expertise in storage optimization, compression techniques, and index tuning. Let us help you get the most out of your MongoDB database.

No items found.