Groupby in Java Streams

Table of Contents

What is the Groupby Operation in Java Streams? #

Grouping elements in a collection by a specific property is a common operation when working with data. In Java Streams, you can use the Collectors.groupingBy method to group elements in a stream by a specific property. This method returns a Map where the keys are the values of the property you are grouping by, and the values are lists of elements that have that property value.

Java Streams `Groupby` and SQL `GROUP BY` compared #

Java Streams Groupby is similar to SQL GROUP BY

The Collectors.groupingBy method in Java Streams is similar to the GROUP BY clause in SQL. In SQL, you can group rows in a table by a specific column, and then perform aggregate functions on the groups. For example, you could group students by their grade level and then calculate the average age of students in each grade.

Here are the key similarities between the two:

Purpose
- Both groupingBy in Java Streams and GROUP BY in SQL are used to aggregate data based on one or more columns (or properties).
Classification
- Both group data into subsets where each subset contains items that share a common attribute.
- In Java Streams, the groupingBy method uses a classifier function to determine the grouping key.
- In SQL, the GROUP BY clause specifies the columns that determine the groups.
Aggregation
- Both can perform aggregation operations on the grouped data.
- In Java Streams, you can use downstream collectors (like Collectors.counting(), Collectors.summingInt(), etc.) to perform aggregation.
- In SQL, you use aggregate functions (like COUNT(), SUM(), AVG(), etc.) to aggregate data within each group.
Result
- The result of both operations is a set of groups, where each group contains the aggregated data.
- In Java Streams, the result is typically a Map where the keys are the group keys and the values are the grouped data or aggregated results.
- In SQL, the result is a table where each row represents a group and includes the aggregated data.

SQL Example #

The following SQL query groups names by their first letter and counts the number of names that start with each letter.

e.g. Given a table names with the following data:

name
Alice
Bob
Charlie
David
Edward
Ana
Brad

The SQL query would be:

SELECT SUBSTRING(name, 1, 1) AS first_letter, COUNT(*) AS count
FROM names
GROUP BY SUBSTRING(name, 1, 1);

Output:

first_letter	count
A	2
B	2
C	1
D	1
E	1

Java Streams Example #

The equivalent Java Streams code for the SQL example is below. It uses the counting collector to count the number of names that start with each letter.

import java.util.*;
import java.util.stream.*;
import static java.util.stream.Collectors.*;

public class GroupingExample {
    public static void main(String[] args) {
        List<String> names = Arrays.asList("Alice", "Bob", "Charlie", "David", "Edward", "Alice", "Bob");

        Map<String, Long> groupedByFirstLetterCount = names.stream()
            .collect(groupingBy(name -> name.substring(0, 1), counting()));

        System.out.println(groupedByFirstLetterCount);
    }
}

Output:

{A=2, B=2, C=1, D=1, E=1}

How to Use `Collectors.groupingBy` #

The gorupingBy method has 3 overloads, that can be used to group elements in a stream by a specific property, and perform additional aggregation operations on the grouped data.

Basic grouping requires only the classifier function.

groupingBy(classifier)

A second overload allows you to specify a downstream collector to perform additional aggregation operations on the grouped data. e.g. counting, summing, etc.

groupingBy(classifier, downstream)

The third overload allows you to specify a map factory to create the resulting map. e.g. TreeMap, LinkedHashMap, etc.

groupingBy(classifier, mapFactory, downstream)

Basic Grouping #

The basic usage of Collectors.groupingBy is to group elements in a stream by a specific property. So will provide a classifier method that extracts the property value to group by.

For example, if you have a list of Student objects and you want to group them by their grade level, you could use the following code:

import java.util.*;
import java.util.stream.Collectors;

class Student {
    private String name;
    private int gradeLevel;

    public Student(String name, int gradeLevel) {
        this.name = name;
        this.gradeLevel = gradeLevel;
    }

    public String getName() {
        return name;
    }

    public int getGradeLevel() {
        return gradeLevel;
    }

    @Override
    public String toString() {
        return "Student{" +
                "name='" + name + '\'' +
                ", gradeLevel=" + gradeLevel +
                '}';
    }
}

public class Main {
    public static void main(String[] args) {
        // Create a list of students
        List<Student> students = Arrays.asList(
                new Student("Alice", 10),
                new Student("Bob", 11),
                new Student("Charlie", 10),
                new Student("David", 11),
                new Student("Eve", 10)
        );

        // Group students by grade level
        Map<Integer, List<Student>> studentsByGrade = students.stream()
                .collect(Collectors.groupingBy(Student::getGradeLevel));

        // Print the result
        studentsByGrade.forEach((grade, studentList) -> {
            System.out.println("Grade " + grade + ": " + studentList);
        });
    }
}

The method Student::getGradeLevel is used as the classifier method to group the students by their grade level. The resulting Map will have the grade level as the key and a list of students as the value. As there are only two grade levels in the example, 10 and 11, the output will be:

Grade 10: [Student{name='Alice', gradeLevel=10}, Student{name='Charlie', gradeLevel=10}, Student{name='Eve', gradeLevel=10}]
Grade 11: [Student{name='Bob', gradeLevel=11}, Student{name='David', gradeLevel=11}]

Downstream Collectors #

Optionally, you can provide a downstream collector to perform additional aggregation operations on the grouped data. For example, you could count the number of students in each grade level by using the counting() collector:

Map<Integer, Long> studentCountByGrade = students.stream()
    .collect(Collectors.groupingBy(Student::getGradeLevel, Collectors.counting()));

This will produce a Map where the keys are the grade levels and the values are the number of students in each grade level. For exzmple, if there are 3 students in grade 10 and 2 students in grade 11, the output will be:

{10=3, 11=2}

Using a Custom Map Factory #

The third overload of Collectors.groupingBy allows you to specify a map factory to create the resulting map. This can be useful if you want to use a specific implementation of the Map interface, such as TreeMap or LinkedHashMap. For example, you could create a TreeMap to store the grouped data:

Map<Integer, List<Student>> studentsByGrade = students.stream()
    .collect(Collectors.groupingBy(Student::getGradeLevel, TreeMap::new, Collectors.toList()));

In this example, the TreeMap::new method reference is used as the map factory to create a TreeMap to store the grouped data. The resulting Map will be sorted by the grade level.

You Might Also Like