Advanced Data Analysis using MongoDB Custom Aggregation Expressions

Mydbops
Jul 3, 2024
12
Mins to Read
All

MongoDB, a leading NoSQL database, just got even more powerful with the introduction of custom aggregation expressions. By harnessing the $function and $accumulator operators, you can write your own JavaScript functions to perform complex data manipulations directly within your database. These tools can enhance performance, streamline workflows, and empower developers to tackle unique data challenges with ease. Let's unlock the full potential of MongoDB’s custom aggregation capabilities.

Custom Aggregation Expressions in MongoDB 4.4

Custom aggregation expressions in MongoDB 4.4 allow you to write JavaScript functions that seamlessly integrate into your aggregation pipelines. Whether you need to perform complex transformations or implement specific logic, these expressions give you the flexibility to tailor MongoDB to your needs.

Why Use Custom Aggregation Expressions?

  • Using custom aggregation expressions can lead to better performance and increased developer productivity.
  • By processing data directly within MongoDB, you can streamline your workflow and deliver faster results to your users.
  • Plus, with the familiarity of JavaScript, you'll feel right at home crafting custom functions to tackle unique challenges.

Getting Started

Before we jump into coding, make sure you have MongoDB 4.4 installed. Familiarize yourself with the MongoDB Aggregation Framework if you haven't already, as it forms the foundation for custom expressions.

 
Consideration Note:

Using the $function and $accumulator operators too often can slow down your MongoDB database. These operators run JavaScript code, which takes up a lot of resources and can make your queries slower. It's best to use built-in MongoDB functions whenever possible, as they're already optimized for performance. Only use $function and $accumulator if you really need them and other options won't work for your specific situation.
	

Understanding $function and $accumulator Operators

These operators are your go-to when you need to implement custom aggregation functions or expressions in MongoDB. Let's break down what they do:

$function: This operator lets you define custom JavaScript functions to implement behaviors not supported by MongoDB's Query Language. It's a handy tool when you need to tackle complex tasks that standard MongoDB functions can't handle.

$accumulator: This operator allows you to define custom accumulator functions with JavaScript. These functions maintain state as documents progress through the aggregation pipeline, offering flexibility for advanced data manipulation.

Considerations

Javascript Enablement

MongoDB provides server-side scripting capabilities using JavaScript, allowing you to run custom functions within aggregation queries. By default, this feature is enabled. However, if you're not using $function, $accumulator, $where, or MapReduce operators, you can enhance security and potentially improve performance by disabling server-side scripting.

Note:

For detailed information see security.javascriptEnabled configuration option or --noscripting command-line option.

Unsupported Array and String Functions

MongoDB 6.0 upgrades its internal JavaScript engine, which is responsible for executing server-side JavaScript code including $function, $accumulator, $where expressions and mapReduce operations. In this upgrade, several deprecated and non-standard array and string functions that existed in earlier versions are removed. It's essential to review the compatibility notes for MongoDB 6.0 to ensure that any custom JavaScript code you've written or dependencies you rely on are compatible with the new JavaScript engine.

Starting in MongoDB 6.0, the following array and string functions are removed and cannot be used in server-side JavaScript with $accumulator, $function, and $where expressions.

Array Function

String Function

Array.concat

Array.shift

String.charAt

String.slice

Array.every

Array.slice

String.charCodeAt

String.split

Array.filter

Array.some

String.concat

String.startsWith

Array.forEach

Array.sort

String.contains

String.substr

Array.indexOf

Array.splice

String.endsWith

String.substring

Array.join

Array.unshift

String.includes

String.toLocaleLowerCase

Array.lastIndexOf

String.indexOf

String.toLocaleUpperCase

Array.map

String.lastIndexOf

String.toLowerCase

Array.pop

String.localeCompare

String.toUpperCase

Array.push

String.match

String.trim

Array.reduce

String.normalize

String.trimLeft

Array.reduceRight

String.replace

String.trimRight

Array.reverse

String.search

$function Operator

The $function operator empowers you to create custom JavaScript functions within your aggregation pipelines. Simply define your function body, specify any arguments, and let MongoDB handle the rest. From simple transformations to advanced computations, the possibilities are endless.

Syntax

 
{
  $function: {
    body: ,
    args: ,
    lang: "js"
  }
}
	

Field

Type

Description

body

String or Code

  • The function definition. You can specify the function definition as either a BSON-type Code or a String.

  • The function should be defined as function(arg1, arg2, ...) { ... } or "function(arg1, arg2, ...) { ... }".

args

Array

  • Arguments passed to the function body. If the function body does not take any arguments, you can specify an empty array [].

  • The array elements can be any BSON type, including Code.

lang

String

  • The language used in the function body. You must specify lang: "js" to indicate that the function is written in JavaScript (JS).

Note:

Schema Validation Restriction: MongoDB allows you to define a schema for your collections to enforce data consistency and integrity. However, the $function operator cannot be used as part of schema validation query expressions. This means that you cannot use custom JavaScript functions to validate your schema.

Examples of “$function” Usage

Here is an example showcasing $function in a MongoDB aggregation pipeline, focusing on real-time product discount calculations:

Insert test data

Let's say we have two collections named "products" and “discounts” with documents that represent individual products, each containing their name, price  and category:

 
db.getSiblingDB(shop).products.insertMany([
  { "_id": 1, "name": "T-Shirt", "price": 20, "category": "Clothing" },
  { "_id":2, "name": "Laptop", "price": 800, "category": "Electronics" },
  { "_id": 3, "name": "Coffee Mug", "price": 5, "category": "Kitchenware" }
])
db.getSiblingDB(shop).discounts.insertMany([
  { "_id": 1, "category": "Clothing", "discount": 10 }, 
  { "_id": 2, "category": "Electronics", "discount": 5 }
])
	

Scenario: Calculate the discounted price for each product in an e-commerce store based on a dynamic discount percentage stored in a separate collection.

Aggregation pipeline

This example utilises $function to access the discount information from another collection and apply it to the product price in the current collection.

 
function discountCal(price, discount) {
{
  if (!discount) return price; // Handle products without discounts
    return price * (1 - discount / 100); // Apply discount as a percentage
  }
}
db.products.aggregate([
  {
    $lookup: { // Join with discounts collection based on category
      from: "discounts",
      localField: "category",
      foreignField: "category",
      as: "discountInfo"
    }
  },
  {
    $unwind: { // Unwind the discountInfo array (potentially multiple discounts per category)
      path: "$discountInfo",
    }
  },
  {
    $project: {
      name: 1,
      price: 1,
      category: 1,
      discount: { // Use $function to calculate discounted price (consider edge cases)
        $function: { 
          body: discountCal,
          args: ["$price", "$discountInfo.discount"],
          lang: "js"
        }
      }
    }
  }
])
	

Explanation

  • In this example, the $function operator is used within the $project stage to calculate the discounted price for each product. It takes two arguments:
    • price: The original price of the product from the products collection.
    • discount: The discount percentage obtained from the discountInfo array after the $lookup and $unwind stages (may be null if no matching discount exists).
  • The function checks if a discount is available. If not, it returns the original price. Otherwise, it calculates the discounted price by multiplying the original price by (1 - discount / 100), effectively applying the discount as a percentage.

This is just one example of how $function can be used for complex calculations within MongoDB aggregations. You can adapt this concept to various scenarios where you need to perform dynamic data manipulation based on custom logic.

After running this aggregation pipeline, the resulting documents will have an additional field called "discount", containing the average score calculated for each product.

 
[
  {
    _id: 1,
    name: 'T-Shirt',
    price: 20,
    category: 'Clothing',
    discount: 18
  },
  {
    _id: 2,
    name: 'Laptop',
    price: 800,
    category: 'Electronics',
    discount: 760
  }
]
	

$accumulator Operator

For more complex scenarios requiring stateful processing, the $accumulator operator comes to the rescue. This operator allows you to define custom accumulator functions that maintain state as documents flow through the pipeline. Whether you're aggregating data over time or performing intricate calculations, the $accumulator operator has you covered.

Syntax

 
{
  $accumulator: {
    init: ,
    initArgs: ,        // Optional
    accumulate: ,
    accumulateArgs: ,
    merge: ,
    finalize: ,                    // Optional
    lang: 
  }
}
	

Field

Type

Description

$accumulator

Object

  • Defines a custom accumulator operator. Accumulators maintain state as documents progress through the pipeline.

init

String or Code

  • Function to initialize the state. Receives arguments from initArgs array expression.

initArgs

Array

  • Optional. Arguments are passed to the init function.

accumulate

String or Code

Function to accumulate documents. Receives arguments from state and accumulateArgs array.

accumulateArgs

Array

Arguments passed to accumulate function.

merge

String or Code

Function to merge two internal states.

finalize

String or Code

Optional. Function to update result of accumulation.

lang

String

The language used in $accumulator code. Currently, only "js" is supported.

Note:

Usage Restriction: $accumulator is available in these stages:

Examples of “$accumulator” Usage

Insert test data

Let's say we have a collection named "students" with documents that represent individual students, each containing their user name and an array of their exam marks:

 
db.getSiblingDB('web').stokes.insertMany([
  { "_id": 1, "name": "A", "WeeklyRate": [2, 4, 6, 8, 10, 9, 7] },
  { "_id":2, "name": "B", "WeeklyRate": [100, 40, 60, 20, 10, 20, 50] },
  { "_id": 3, "name": "C", "WeeklyRate": [200, 210, 190, 120, 320, 190, 400] }
])
	

Scenario: Calculate the statistical data of a Stoke website for analysis of the Stokes.

Aggregation pipeline

 
db.getSiblingDB('web').stokes.aggregate([
  {
    $project: {
      WeeklyRate: 1, 
      name:1 
    }
  },
  {
    $unwind: "$WeeklyRate" // Deconstruct the array
  },
  {
    $group: {
      _id: "$name",
      stats: {
        $accumulator: {
          init: function() {
            return {
              count: 0,
              sum: 0,
              sumSquared: 0,
              min: Number.MAX_VALUE,
              max: -Number.MAX_VALUE
            };
          }, // Initialize accumulator to hold count, sum, sum of squares, min, and max
          accumulate: function(result, num) {
            result.count++; // Increment count
            result.sum += num; // Add number to sum
            result.sumSquared += num * num; // Add square of number to sum of squares
            result.min = Math.min(result.min, num); // Update min
            result.max = Math.max(result.max, num); // Update max
            return result;
          }, // Accumulation function
          accumulateArgs: ["$WeeklyRate"], // Accumulation arguments: Use the "WeeklyRate" field from documents
          merge: function(result1, result2) {
            return {
              count: result1.count + result2.count,
              sum: result1.sum + result2.sum,
              sumSquared: result1.sumSquared + result2.sumSquared,
              min: Math.min(result1.min, result2.min),
              max: Math.max(result1.max, result2.max)
            };
          }, // Merge function
          finalize: function(result) {
            // Finalize function to calculate mean and standard deviation
            result.mean = result.sum / result.count; // Calculate mean
            result.variance =
              (result.sumSquared - result.sum * result.mean) / result.count; // Calculate variance
            result.stdDev = Math.sqrt(result.variance); // Calculate standard deviation
            return result;
          }
        }
      }
    }
  }
])
	

Explanation

  • $project: This stage projects only the "WeeklyRate" and "name" fields from each document, discarding other fields.
  • $unwind: This stage deconstructs the "WeeklyRate" array field, creating a new document for each element in the array. This allows for easier aggregation and analysis of individual weekly rates.
  • $group: This stage groups the documents by the "name" field, creating groups of documents with the same name. Within each group, it applies the $accumulator operator to calculate various statistics for the weekly rates.
    • $accumulator: This custom accumulator operator is used to calculate statistics such as count, sum, sum of squares, minimum, maximum, mean, variance, and standard deviation for the weekly rates within each group.
      • init: Initializes the accumulator with initial values for count, sum, sum of squares, minimum, and maximum.
      • accumulate: Accumulates values from the "WeeklyRate" field into the accumulator, updating count, sum, sum of squares, minimum, and maximum.
      • accumulateArgs: Specifies the field from which to extract values for accumulation, in this case, the "WeeklyRate" field.
      • merge: Merges two accumulators together during the aggregation process, combining their counts, sums, sum of squares, minimums, and maximums.
      • finalize: Finalizes the accumulator by calculating the mean, variance, and standard deviation based on the accumulated values.

This pipeline outputs statistics such as count, sum, sum of squares, minimum, maximum, mean, variance, and standard deviation for each group of weekly rates, grouped by the "name" field. These statistics provide insights into the distribution and variability of weekly rates within each group.

 
[
  {
    _id: 'A',
    stats: {
      count: 7,
      sum: 46,
      sumSquared: 350,
      min: 2,
      max: 10,
      mean: 6.571428571428571,
      variance: 6.816326530612246,
      stdDev: 2.610809554642438
    }
  },
  {
    _id: 'C',
    stats: {
      count: 7,
      sum: 1630,
      sumSquared: 433100,
      min: 120,
      max: 400,
      mean: 232.85714285714286,
      variance: 7648.979591836737,
      stdDev: 87.45844494293696
    }
  },
  {
    _id: 'B',
    stats: {
      count: 7,
      sum: 300,
      sumSquared: 18600,
      min: 10,
      max: 100,
      mean: 42.857142857142854,
      variance: 820.4081632653061,
      stdDev: 28.64276807966203
    }
  }
]
	

With MongoDB's custom aggregation expressions, you have the power to unlock new possibilities for data manipulation and analysis. Whether you're a seasoned MongoDB developer or just getting started, these features provide a powerful toolkit to elevate your projects to the next level.

Ready to unlock the full potential of MongoDB for your data manipulation and analysis needs? Partner with Mydbops for expert MongoDB Managed, Consulting, and Remote DBA Services. Our team of experienced DBAs can help you leverage MongoDB's advanced features and optimize your database performance. Contact Mydbops today.

{{cta}}

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.