Enhancing Data Security: A Deep Dive into MongoDB's Queryable Encryption

Mydbops
Oct 24, 2023
9
Mins to Read
All

MongoDB's Queryable Encryption is the latest addition to its array of cutting-edge features, promising a revolutionary level of data security. With this feature, you can encrypt your sensitive data from your end, ensuring it remains confidential throughout its lifecycle. It takes your data and stores it as randomized encrypted data on the server, all while keeping the actual data hidden from the server itself.

This innovative approach ensures that your sensitive information stays encrypted during transmission, storage, utilization, backups, and even in logs. The best part? Only you, holding the encryption keys, have the power to decrypt and access the data. In this article, we'll delve into how MongoDB's Queryable Encryption functions and why it's a game-changer for safeguarding your data.

Setting up Queryable Encryption

Setting up Queryable Encryption involves three distinct approaches:

Automatic Encryption

This streamlined method eliminates the need for manual code creation. Encrypted read and write operations occur seamlessly, without requiring explicit encryption code. This automation ensures data security without additional effort.

It's important to note that Automatic Encryption is exclusive to MongoDB Enterprise Edition; MongoDB Community Edition does not support this feature.

Explicit Encryption

For those who prefer a hands-on approach, explicit encryption offers customization. This approach involves utilizing MongoDB's encryption library through your driver to define encryption logic.

Envelope Encryption

First, your data is protected with a Data Encryption Key (DEK). Then, this DEK is further secured by encrypting it with a Customer Master Key (CMK). The CMK, your ultimate key protector, is created using tools like a cloud Key Management Service (KMS). MongoDB keeps the encrypted DEKs safe in the Key Vault collection. Deleting a DEK makes the associated data unreadable, and deleting a CMK makes data tied to its DEKs permanently inaccessible. Envelope Encryption is like a double lock for your data, ensuring its safety and confidentiality.

MongoDB Queryable Encryption

Exploring the Automatic Encryption Shared Library for Queryable Encryption

Think of it as a clever tool for your application, expertly managing automatic Queryable Encryption. This tool knows what needs to be encrypted or decrypted and ensures that your application handles encrypted data correctly.

To get it, simply head over to the MongoDB Download Center. Select your version and platform, and then download the library.

Installation

Packages

 
pip3 install 'pymongo[encryption]'
	

Code Example

 
import os
import random
import string
import datetime
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
from pymongo.encryption_options import AutoEncryptionOpts
from pymongo.encryption import ClientEncryption
from bson.codec_options import CodecOptions
from bson.binary import STANDARD
kms_provider_name = "local"
uri = CONNECT_STRING# Your connection URI
key_vault_database_name = "encryption"
key_vault_collection_name = "__keyVault"
key_vault_namespace = f"{key_vault_database_name}.{key_vault_collection_name}"
encrypted_database_name = "medicalRecords"
encrypted_collection_name = "patients"


path = "./customer-master-key.txt"
with open(path, "rb") as f:
local_master_key = f.read()
kms_provider_credentials = {
"local": {
"key": local_master_key
},
}
auto_encryption_options = AutoEncryptionOpts(
kms_provider_credentials,
key_vault_namespace,
crypt_shared_lib_path="PATH/mongo_crypt_v1.so"
)


encrypted_client = MongoClient(
uri, auto_encryption_opts=auto_encryption_options
)


encrypted_fields_map = {
"fields": [
{
"path": "patientRecord.ssn",
"bsonType": "string",
"queries": [{"queryType": "equality"}]
},
{
"path": "patientRecord.billing",
"bsonType": "object",
}
]
}


client_encryption = ClientEncryption(
kms_providers=kms_provider_credentials,
key_vault_namespace=key_vault_namespace,
key_vault_client=encrypted_client,
codec_options=CodecOptions(uuid_representation=STANDARD)
)


customer_master_key_credentials = {}
client_encryption.create_encrypted_collection(
encrypted_client[encrypted_database_name],
encrypted_collection_name,
encrypted_fields_map,
kms_provider_name,
customer_master_key_credentials,
)


print("Collection created successfully")
	

In the provided code, we established the patients collection within the medicalRecords database. For security reasons in production, it's essential to avoid using a local key file. Instead, consider using key providers like AWS, GCP, or Azure.

The shared library plays a crucial role in identifying encrypted fields and preventing unsupported actions on encrypted data. If, for any reason, the Automatic Encryption Shared Library isn't accessible, the driver will attempt to connect to mongocryptd for encryption.

Note: Queryable Encryption is only available for new collections. You cannot add or remove Queryable Encryption from existing collections.

Key Vault Collection

MongoDB includes a dedicated collection known as __keyVault to house Data Encryption Keys (DEKs). These keys serve the essential purpose of encrypting and decrypting fields within your encrypted collections.

 
Enterprise red [direct: primary] encryption> db.getCollection("__keyVault").findOne()
{
_id: new UUID("a4347575-f59a-4b52-906a-1649a5da3def"),
keyMaterial: Binary.createFromBase64("bKWlcexCrUrZs73VF/hInQa9w4Llkw/8QIGwnHlGBOj/GKOtjdp00c3QSBQ2mfRFr/hoi8AcbM3pIfJksklGnTPvjkNxbgH7EKyf/5QcuJafSOkCE2Hyta06VzJZetLipWxQznlo2f5o3+bQRP7uGV0zFgGKFH0NppMCwj/r5qdUR2xz15ghvFvDwCaBkYNBTKTxUh+4e68gpkSVtvSqmA==", 0),
creationDate: ISODate("2023-09-11T14:27:53.439Z"),
updateDate: ISODate("2023-09-11T14:27:53.439Z"),
status: 0,
masterKey: { provider: 'local' }
}
	

Additionally, when configuring an encrypted collection, MongoDB generates two distinct special collections referred to as ESC and ECOC. These collections function as discreet helpers within the encryption process. As you introduce documents with encrypted fields that require querying, MongoDB updates these auxiliary collections to enhance search performance. This transforms the associated fields into special fields that are optimized for efficient querying. It's important to bear in mind that this specialized treatment does consume some storage and may influence write operation speed. The key is to strike a balance between facilitating fast searches and effectively managing storage and performance.

Document Insertion

Inserting documents with encrypted fields requires configuring the autoEncryption parameter during the setup of your database connection. Here's a code snippet that demonstrates this procedure:

 
import os
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
from pymongo.encryption_options import AutoEncryptionOpts
from pymongo.encryption import ClientEncryption
from bson.codec_options import CodecOptions
from bson.binary import STANDARD
import random
import string


# KMS provider name should be one of the following: "aws", "gcp", "azure", "kmip" or "local"
kms_provider_name = "local"
uri = CONNECTION STRING# Your connection URI
key_vault_database_name = "encryption"
key_vault_collection_name = "__keyVault"
key_vault_namespace = f"{key_vault_database_name}.{key_vault_collection_name}"
encrypted_database_name = "medicalRecords"
encrypted_collection_name = "patients"


path = "./customer-master-key.txt"
with open(path, "rb") as f:
local_master_key = f.read()
kms_provider_credentials = {
"local": {
"key": local_master_key
},
}


auto_encryption_options = AutoEncryptionOpts(
kms_provider_credentials,
key_vault_namespace,
crypt_shared_lib_path="PATH/mongo_crypt_v1.so"
)
encrypted_client = MongoClient(
uri, auto_encryption_opts=auto_encryption_options
)


encrypted_fields_map = {
"fields": [
{
"path": "patientRecord.ssn",
"bsonType": "string",
"queries": [{"queryType": "equality"}]
},
{
"path": "patientRecord.billing",
"bsonType": "object",
}
]
}


patient_document = {
"patientName": "Jon Doe",
"patientId": 12345678,
"patientRecord": {
"ssn": "987-65-4320",
"billing": {
"type": "Visa",
"number": "4111111111111111",
},
},
}


encrypted_collection = encrypted_client[encrypted_database_name][encrypted_collection_name]
encrypted_collection.insert_one(patient_document)
	

Encrypted vs. Normal Document Insertion

The process of inserting 100,000 documents into an encrypted collection consumed roughly 18 minutes and 27.816 seconds, considering two encryption fields. In contrast, performing the same operation within a non-encrypted collection took only about 3.6 seconds. This notable difference in performance emphasizes the additional time overhead introduced by encryption processes. While encryption significantly enhances data security, it comes with a trade-off in terms of insertion speed.

Host information: Memory: 3.8 GB, CPU: 2 cores (x86_64), Linux (Ubuntu 22.04)

Encrypted Insertion

MongoDB Queryable Encryption

Normal Document Insertion

MongoDB Queryable Encryption

Query Flexibility with Automatic Encryption

  • Automatic encryption supports specific equality query operators: $eq, $ne, $in, $nin, $and, $or, $not, and $nor.
  • To enable clients to run read and write queries on these fields, you can add the queries property to your JSON schema. This property allows you to specify which fields are queryable.
  • If you omit the queries property, querying for that field will be restricted, providing a balance between data security and query flexibility using automatic encryption.

Controlling Contention Factor:

  • Adjusting the contention factor changes the default value of 8. Increasing contention speeds up insert and update operations, especially for low-cardinality fields.
  • However, it's important to note that higher contention might impact find performance. Finding the right balance between these factors is crucial for the effective management of encrypted fields.

Enabling Queryability:

  • Enabling queryability for encrypted fields involves creating an index for each field.
  • This process can slightly slow down write operations on those fields. Whenever a write operation modifies an indexed field, MongoDB updates the associated index.

Usage of $or Operator:

  • When using the $or operator, remember that only encrypted equality queries can be executed with an encrypted equality index.

Usage of $ne Operator:

  • When using the $ne operator, the totalKeysExamined value will be 0, and the query will perform a scan of all the documents.
 
"path": "patientRecord.ssn",
"bsonType": "string",
"queries": [{"queryType": "equality",contention: "0"}]
	
 
import os
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
from pymongo.encryption_options import AutoEncryptionOpts
from pymongo.encryption import ClientEncryption
from bson.codec_options import CodecOptions
from bson.binary import STANDARD


kms_provider_name = "local"
uri = CONNECTION_STRING# Your connection URI
key_vault_database_name = "encryption"
key_vault_collection_name = "__keyVault"
key_vault_namespace = f"{key_vault_database_name}.{key_vault_collection_name}"
encrypted_database_name = "medicalRecords"
encrypted_collection_name = "patients"


path = "./customer-master-key.txt"
with open(path, "rb") as f:
local_master_key = f.read()
kms_provider_credentials = {
"local": {
"key": local_master_key
},
}
auto_encryption_options = AutoEncryptionOpts(
kms_provider_credentials,
key_vault_namespace,
crypt_shared_lib_path="PATH/mongo_crypt_v1.so"
)
encrypted_client = MongoClient(
uri, auto_encryption_opts=auto_encryption_options)


encrypted_collection = encrypted_client[encrypted_database_name][encrypted_collection_name]
find_result = encrypted_collection.find_one({
"patientRecord.ssn": "987-65-4320"
})
print(find_result)
	

Data Backup with Queryable Encryption

When it comes to handling data backup in the context of queryable encryption, certain complexities arise. It's important to note that tools like mongodump, mongorestore, mongoimport, and mongoexport do not currently support queryable encryption. Therefore, when you attempt data restoration, you may encounter an error message such as bulk write exception: write errors: [Cannot insert a document with field name safeContent. Unfortunately, the available documentation may lack clarity when it comes to the intricacies of performing backups in the presence of queryable encryption.

If you have concerns or questions regarding data backup in such scenarios, please feel free to provide them in the command box for further assistance.

Limitations of MongoDB's Queryable Encryption

  1. Contention Factor: The contention factor can only be set when defining a field for encryption. Once a field is designated for encryption, the contention factor remains unchangeable.
  2. Restoration of Encrypted Collections: When attempting to restore encrypted collections, trying to restore a document with the field name safeContent will result in an error.
  3. Manual Index Compaction: If the metadata collections exceed a size of 1 GB, manual index compaction is required.
  4. Excluded CRUD Operations: Certain CRUD (Create, Read, Update, Delete) operations are excluded from being recorded in the slow operations query log and the Database Profiler's system.profile collection when performed on an encrypted collection.
  5. Standalone Deployments: Queryable encryption is not supported in standalone deployment configurations.

For more detailed information, you can refer to the source link: MongoDB Queryable Encryption Limitations.

In summary, MongoDB's queryable encryption is a robust security feature that safeguards sensitive data on the server, ensuring its confidentiality both at rest and during transit. However, it's essential to acknowledge certain limitations. For example, the contention factor can only be defined during field encryption, and once set, it remains unchanged. Additionally, specific CRUD operations are excluded from certain logs and collections when executed on encrypted data. Despite these limitations, MongoDB's queryable encryption represents a significant enhancement to data protection in database deployments.

Stay connected for more valuable MongoDB insights.

Also read: MongoDB 7.0 Cluster-to-Cluster Sync: Simplifying Data Synchronization

Mastering Shard Removal in MongoDB: A Step-by-Step Guide

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.