Aggregation and Grouping#

EQL supports powerful aggregation functions that allow you to summarize data across groups of entities. This includes common operations like counting, summing, and averaging.

Grouping Results#

The .grouped_by() method allows you to group results by one or more symbolic variables. When you use an aggregator in a query, EQL automatically calculates it for each group.

from krrood.entity_query_language.factories import entity
import krrood.entity_query_language.factories as eql

# Group robots by their 'type' and count them by type, returning a count for the occurrences of each type.
query = eql.count(r.type).grouped_by(r.type)

Using Aggregators#

Aggregators are functions that take a symbolic variable and return a summary value.

Available aggregators include:

  • count(): Counts the number of items in the group.

  • sum(): Calculates the sum of a numeric attribute.

  • average(): Calculates the average of a numeric attribute.

  • max() and min(): Find the maximum or minimum value.

  • count_all(): Counts all records in a group, regardless of a specific variable.

  • mode(): Returns the most frequent value in a group.

  • multimode(): Returns all values that have the maximum frequency (in case of ties).

Most aggregators support these optional parameters:

  • key: A function to extract the value for aggregation/comparison from the object.

  • default: The value to return if the group is empty.

  • distinct: Whether to consider only unique values.

from krrood.entity_query_language.factories import sum, average

# Calculate the total and average battery level per robot type
query = set_of(r.type, sum(r.battery), average(r.battery)).grouped_by(r.type)

Hint

You can use .distinct() inside an aggregator to count only unique values: count(r.name, distinct=True).

Finding Extremes with max() and min()#

The max() and min() aggregators find the extreme values in a group. While they often operate on numeric attributes, you can use the key argument to find the object that has an extreme property.

import krrood.entity_query_language.factories as eql

# Find the maximum battery level
query = eql.max(r.battery)

# Find the robot object with the highest battery level
# This returns the ExampleRobot instance itself, not just the battery number
query = eql.max(r, key=lambda robot: robot.battery)

Post-Aggregation Filtering with .having()#

While .where() filters individual entities before they are grouped, .having() filters the results after they have been aggregated.

# Only show types that have more than 5 robots
query = entity(r.type).grouped_by(r.type).having(count(r) > 5)

Warning

Always use .where() for conditions that can be evaluated on individual objects. Use .having() only for conditions that depend on group-level aggregates.

Full Example: ExampleWorld Statistics#

from dataclasses import dataclass
from krrood.entity_query_language.factories import variable, set_of, Symbol, count, sum, average

@dataclass
class ExampleRobot(Symbol):
    name: str
    type: str
    battery: int

robots = [
    ExampleRobot("R1", "Astromech", 100),
    ExampleRobot("R2", "Astromech", 80),
    ExampleRobot("C1", "Protocol", 20),
    ExampleRobot("C2", "Protocol", 40),
    ExampleRobot("K1", "Security", 90)
]

r = variable(ExampleRobot, domain=robots)

# We want to see:
# 1. The type of robot
# 2. How many of each type (count)
# 3. The average battery for that type
# BUT only for types where the total battery sum > 50
query = set_of(r.type, c:=count(r), avg_batt:=average(r.battery)) \
    .grouped_by(r.type) \
    .having(sum(r.battery) > 50)

for result in query.evaluate():
    print(f"Type: {result[r.type]} | Count: {result[c]} | Avg Battery: {result[avg_batt]}%")
Type: Astromech | Count: 2 | Avg Battery: 90.0%
Type: Protocol | Count: 2 | Avg Battery: 30.0%
Type: Security | Count: 1 | Avg Battery: 90.0%

More Aggregations …#

Counting All Records with count_all()#

The count_all() aggregator is used to count all records within a group, regardless of any specific variable. This is particularly useful when grouping by a variable and wanting to know the size of each resulting group.

import krrood.entity_query_language.factories as eql
from krrood.entity_query_language.factories import variable, set_of

# Given a variable with repeated values
domain = [1, 2, 3, 2, 2, 1, 3]
var = variable(int, domain=domain)

# Count how many times each value appears
# This groups by 'var' and counts the records in each group
query = set_of(var, count:=eql.count_all()).grouped_by(var)

# Result: [(1, 2), (2, 3), (3, 2)]
results = query.tolist()
print(results)
[{int: 1, CountAll: 2}, {int: 2, CountAll: 3}, {int: 3, CountAll: 2}]

Advanced: Finding the Mode using Nested Aggregations#

With EQL you can perform multiple nested aggregations to find the mode manually. This example uses a subquery to find the maximum count and then filters the groups by that count.

Note

This example is a bit contrived, but it demonstrates how to use multiple aggregators to find the mode. In practice, you would usually use a single aggregator like mode() or multimode().

import krrood.entity_query_language.factories as eql
from krrood.entity_query_language.factories import variable, entity

# Given a variable with repeated values
domain = [1, 2, 3, 2, 2, 1, 3, 3]
var = variable(int, domain=domain)

# 1. Calculate the maximum frequency across all groups
max_count = entity(eql.max(eql.count(var).grouped_by(var)))

# 2. Find the values whose frequency equals that maximum
mode_query = set_of(var, count:=eql.count_all()).grouped_by(var).having(count == max_count)

# This will return [(2, 3), (3, 3)] as 2, and 3 are the most frequent values, each with a count of 3.
results = [(res[var], res[count]) for res in mode_query.tolist()]
print(results)
[(2, 3), (3, 3)]

API Reference#