Advanced Data Analysis using MongoDB Custom Aggregation Expressions

Jul 3, 2024

Mins to Read

All

MongoDB, a leading NoSQL database, just got even more powerful with the introduction of custom aggregation expressions. By harnessing the $function and $accumulator operators, you can write your own JavaScript functions to perform complex data manipulations directly within your database. These tools can enhance performance, streamline workflows, and empower developers to tackle unique data challenges with ease. Let's unlock the full potential of MongoDB’s custom aggregation capabilities.

Custom Aggregation Expressions in MongoDB 4.4

Custom aggregation expressions in MongoDB 4.4 allow you to write JavaScript functions that seamlessly integrate into your aggregation pipelines. Whether you need to perform complex transformations or implement specific logic, these expressions give you the flexibility to tailor MongoDB to your needs.

Why Use Custom Aggregation Expressions?

Using custom aggregation expressions can lead to better performance and increased developer productivity.
By processing data directly within MongoDB, you can streamline your workflow and deliver faster results to your users.
Plus, with the familiarity of JavaScript, you'll feel right at home crafting custom functions to tackle unique challenges.

Getting Started

Before we jump into coding, make sure you have MongoDB 4.4 installed. Familiarize yourself with the MongoDB Aggregation Framework if you haven't already, as it forms the foundation for custom expressions.

 
Consideration Note:

Using the $function and $accumulator operators too often can slow down your MongoDB database. These operators run JavaScript code, which takes up a lot of resources and can make your queries slower. It's best to use built-in MongoDB functions whenever possible, as they're already optimized for performance. Only use $function and $accumulator if you really need them and other options won't work for your specific situation.

Understanding $function and $accumulator Operators

These operators are your go-to when you need to implement custom aggregation functions or expressions in MongoDB. Let's break down what they do:

$function: This operator lets you define custom JavaScript functions to implement behaviors not supported by MongoDB's Query Language. It's a handy tool when you need to tackle complex tasks that standard MongoDB functions can't handle.

$accumulator: This operator allows you to define custom accumulator functions with JavaScript. These functions maintain state as documents progress through the aggregation pipeline, offering flexibility for advanced data manipulation.

Considerations

Javascript Enablement

MongoDB provides server-side scripting capabilities using JavaScript, allowing you to run custom functions within aggregation queries. By default, this feature is enabled. However, if you're not using $function, $accumulator, $where, or MapReduce operators, you can enhance security and potentially improve performance by disabling server-side scripting.

Note:

For detailed information see security.javascriptEnabled configuration option or --noscripting command-line option.

Unsupported Array and String Functions

MongoDB 6.0 upgrades its internal JavaScript engine, which is responsible for executing server-side JavaScript code including $function, $accumulator, $where expressions and mapReduce operations. In this upgrade, several deprecated and non-standard array and string functions that existed in earlier versions are removed. It's essential to review the compatibility notes for MongoDB 6.0 to ensure that any custom JavaScript code you've written or dependencies you rely on are compatible with the new JavaScript engine.

Starting in MongoDB 6.0, the following array and string functions are removed and cannot be used in server-side JavaScript with $accumulator, $function, and $where expressions.

Array Function		String Function
Array.concat	Array.shift	String.charAt	String.slice
Array.every	Array.slice	String.charCodeAt	String.split
Array.filter	Array.some	String.concat	String.startsWith
Array.forEach	Array.sort	String.contains	String.substr
Array.indexOf	Array.splice	String.endsWith	String.substring
Array.join	Array.unshift	String.includes	String.toLocaleLowerCase
Array.lastIndexOf		String.indexOf	String.toLocaleUpperCase
Array.map		String.lastIndexOf	String.toLowerCase
Array.pop		String.localeCompare	String.toUpperCase
Array.push		String.match	String.trim
Array.reduce		String.normalize	String.trimLeft
Array.reduceRight		String.replace	String.trimRight
Array.reverse		String.search

$function Operator

The $function operator empowers you to create custom JavaScript functions within your aggregation pipelines. Simply define your function body, specify any arguments, and let MongoDB handle the rest. From simple transformations to advanced computations, the possibilities are endless.

Syntax

 
{
  $function: {
    body: ,
    args: ,
    lang: "js"
  }
}

Field	Type	Description
body	String or Code	The function definition. You can specify the function definition as either a BSON-type Code or a String. The function should be defined as function(arg1, arg2, ...) { ... } or "function(arg1, arg2, ...) { ... }".
args	Array	Arguments passed to the function body. If the function body does not take any arguments, you can specify an empty array []. The array elements can be any BSON type, including Code.
lang	String	The language used in the function body. You must specify lang: "js" to indicate that the function is written in JavaScript (JS).

Note:

Schema Validation Restriction: MongoDB allows you to define a schema for your collections to enforce data consistency and integrity. However, the $function operator cannot be used as part of schema validation query expressions. This means that you cannot use custom JavaScript functions to validate your schema.

`Examples of “$function” Usage`

Here is an example showcasing $function in a MongoDB aggregation pipeline, focusing on real-time product discount calculations:

Insert test data

Let's say we have two collections named "products" and “discounts” with documents that represent individual products, each containing their name, price and category:

 
db.getSiblingDB(shop).products.insertMany([
  { "_id": 1, "name": "T-Shirt", "price": 20, "category": "Clothing" },
  { "_id":2, "name": "Laptop", "price": 800, "category": "Electronics" },
  { "_id": 3, "name": "Coffee Mug", "price": 5, "category": "Kitchenware" }
])
db.getSiblingDB(shop).discounts.insertMany([
  { "_id": 1, "category": "Clothing", "discount": 10 }, 
  { "_id": 2, "category": "Electronics", "discount": 5 }
])

Scenario: Calculate the discounted price for each product in an e-commerce store based on a dynamic discount percentage stored in a separate collection.

Aggregation pipeline

This example utilises $function to access the discount information from another collection and apply it to the product price in the current collection.

 
function discountCal(price, discount) {
{
  if (!discount) return price; // Handle products without discounts
    return price * (1 - discount / 100); // Apply discount as a percentage
  }
}
db.products.aggregate([
  {
    $lookup: { // Join with discounts collection based on category
      from: "discounts",
      localField: "category",
      foreignField: "category",
      as: "discountInfo"
    }
  },
  {
    $unwind: { // Unwind the discountInfo array (potentially multiple discounts per category)
      path: "$discountInfo",
    }
  },
  {
    $project: {
      name: 1,
      price: 1,
      category: 1,
      discount: { // Use $function to calculate discounted price (consider edge cases)
        $function: { 
          body: discountCal,
          args: ["$price", "$discountInfo.discount"],
          lang: "js"
        }
      }
    }
  }
])

Explanation

In this example, the $function operator is used within the $project stage to calculate the discounted price for each product. It takes two arguments:price: The original price of the product from the products collection. discount: The discount percentage obtained from the discountInfo array after the $lookup and $unwind stages (may be null if no matching discount exists).
The function checks if a discount is available. If not, it returns the original price. Otherwise, it calculates the discounted price by multiplying the original price by (1 - discount / 100), effectively applying the discount as a percentage.

This is just one example of how $function can be used for complex calculations within MongoDB aggregations. You can adapt this concept to various scenarios where you need to perform dynamic data manipulation based on custom logic.

After running this aggregation pipeline, the resulting documents will have an additional field called "discount", containing the average score calculated for each product.

 
[
  {
    _id: 1,
    name: 'T-Shirt',
    price: 20,
    category: 'Clothing',
    discount: 18
  },
  {
    _id: 2,
    name: 'Laptop',
    price: 800,
    category: 'Electronics',
    discount: 760
  }
]

`$accumulator Operator`

For more complex scenarios requiring stateful processing, the $accumulator operator comes to the rescue. This operator allows you to define custom accumulator functions that maintain state as documents flow through the pipeline. Whether you're aggregating data over time or performing intricate calculations, the $accumulator operator has you covered.

Syntax

 
{
  $accumulator: {
    init: ,
    initArgs: ,        // Optional
    accumulate: ,
    accumulateArgs: ,
    merge: ,
    finalize: ,                    // Optional
    lang: 
  }
}

Field	Type	Description
$accumulator	Object	Defines a custom accumulator operator. Accumulators maintain state as documents progress through the pipeline.
init	String or Code	Function to initialize the state. Receives arguments from initArgs array expression.
initArgs	Array	Optional. Arguments are passed to the init function.
accumulate	String or Code	Function to accumulate documents. Receives arguments from state and accumulateArgs array.
accumulateArgs	Array	Arguments passed to accumulate function.
merge	String or Code	Function to merge two internal states.
finalize	String or Code	Optional. Function to update result of accumulation.
lang	String	The language used in $accumulator code. Currently, only "js" is supported.

Note:

Usage Restriction: $accumulator is available in these stages:

`Examples of “$accumulator” Usage`

Insert test data

Let's say we have a collection named "students" with documents that represent individual students, each containing their user name and an array of their exam marks:

 
db.getSiblingDB('web').stokes.insertMany([
  { "_id": 1, "name": "A", "WeeklyRate": [2, 4, 6, 8, 10, 9, 7] },
  { "_id":2, "name": "B", "WeeklyRate": [100, 40, 60, 20, 10, 20, 50] },
  { "_id": 3, "name": "C", "WeeklyRate": [200, 210, 190, 120, 320, 190, 400] }
])

Scenario: Calculate the statistical data of a Stoke website for analysis of the Stokes.

Aggregation pipeline

 
db.getSiblingDB('web').stokes.aggregate([
  {
    $project: {
      WeeklyRate: 1, 
      name:1 
    }
  },
  {
    $unwind: "$WeeklyRate" // Deconstruct the array
  },
  {
    $group: {
      _id: "$name",
      stats: {
        $accumulator: {
          init: function() {
            return {
              count: 0,
              sum: 0,
              sumSquared: 0,
              min: Number.MAX_VALUE,
              max: -Number.MAX_VALUE
            };
          }, // Initialize accumulator to hold count, sum, sum of squares, min, and max
          accumulate: function(result, num) {
            result.count++; // Increment count
            result.sum += num; // Add number to sum
            result.sumSquared += num * num; // Add square of number to sum of squares
            result.min = Math.min(result.min, num); // Update min
            result.max = Math.max(result.max, num); // Update max
            return result;
          }, // Accumulation function
          accumulateArgs: ["$WeeklyRate"], // Accumulation arguments: Use the "WeeklyRate" field from documents
          merge: function(result1, result2) {
            return {
              count: result1.count + result2.count,
              sum: result1.sum + result2.sum,
              sumSquared: result1.sumSquared + result2.sumSquared,
              min: Math.min(result1.min, result2.min),
              max: Math.max(result1.max, result2.max)
            };
          }, // Merge function
          finalize: function(result) {
            // Finalize function to calculate mean and standard deviation
            result.mean = result.sum / result.count; // Calculate mean
            result.variance =
              (result.sumSquared - result.sum * result.mean) / result.count; // Calculate variance
            result.stdDev = Math.sqrt(result.variance); // Calculate standard deviation
            return result;
          }
        }
      }
    }
  }
])

Explanation

$project: This stage projects only the "WeeklyRate" and "name" fields from each document, discarding other fields.
$unwind: This stage deconstructs the "WeeklyRate" array field, creating a new document for each element in the array. This allows for easier aggregation and analysis of individual weekly rates.
$group: This stage groups the documents by the "name" field, creating groups of documents with the same name. Within each group, it applies the $accumulator operator to calculate various statistics for the weekly rates.$accumulator: This custom accumulator operator is used to calculate statistics such as count, sum, sum of squares, minimum, maximum, mean, variance, and standard deviation for the weekly rates within each group.init: Initializes the accumulator with initial values for count, sum, sum of squares, minimum, and maximum. accumulate: Accumulates values from the "WeeklyRate" field into the accumulator, updating count, sum, sum of squares, minimum, and maximum. accumulateArgs: Specifies the field from which to extract values for accumulation, in this case, the "WeeklyRate" field. merge: Merges two accumulators together during the aggregation process, combining their counts, sums, sum of squares, minimums, and maximums. finalize: Finalizes the accumulator by calculating the mean, variance, and standard deviation based on the accumulated values.

This pipeline outputs statistics such as count, sum, sum of squares, minimum, maximum, mean, variance, and standard deviation for each group of weekly rates, grouped by the "name" field. These statistics provide insights into the distribution and variability of weekly rates within each group.

 
[
  {
    _id: 'A',
    stats: {
      count: 7,
      sum: 46,
      sumSquared: 350,
      min: 2,
      max: 10,
      mean: 6.571428571428571,
      variance: 6.816326530612246,
      stdDev: 2.610809554642438
    }
  },
  {
    _id: 'C',
    stats: {
      count: 7,
      sum: 1630,
      sumSquared: 433100,
      min: 120,
      max: 400,
      mean: 232.85714285714286,
      variance: 7648.979591836737,
      stdDev: 87.45844494293696
    }
  },
  {
    _id: 'B',
    stats: {
      count: 7,
      sum: 300,
      sumSquared: 18600,
      min: 10,
      max: 100,
      mean: 42.857142857142854,
      variance: 820.4081632653061,
      stdDev: 28.64276807966203
    }
  }
]

With MongoDB's custom aggregation expressions, you have the power to unlock new possibilities for data manipulation and analysis. Whether you're a seasoned MongoDB developer or just getting started, these features provide a powerful toolkit to elevate your projects to the next level.

Ready to unlock the full potential of MongoDB for your data manipulation and analysis needs? Partner with Mydbops for expert MongoDB Managed, Consulting, and Remote DBA Services. Our team of experienced DBAs can help you leverage MongoDB's advanced features and optimize your database performance. Contact Mydbops today.

{{cta}}

Contact us today for a Personalised MongoDB Consultation!

MongoDB

database

No SQL

opensource

Mydbops

$function

$accumulator

Javascript

CustomAggregation

MongoDB 4.4

No items found.

Advanced Data Analysis using MongoDB Custom Aggregation Expressions

Custom Aggregation Expressions in MongoDB 4.4

Why Use Custom Aggregation Expressions?

Getting Started

Understanding $function and $accumulator Operators

Considerations

$function Operator

Syntax

`Examples of “$function” Usage`

`$accumulator Operator`

`Examples of “$accumulator” Usage`

`About the Author`

`Topic covered`

`Continue Reading`

PostgreSQL Disaster Recovery: Easy Steps for Effective Backup and Restoration

What is a PostgreSQL Schema? A Beginner’s Guide

Handling Index Fragmentation in MongoDB

MySQL Partitioning in 8.0

`Subscribe Now!`