Skip to main content

Class 5: DataLoader & The N+1 Problem

Duration: 30 minutes Difficulty: Intermediate Prerequisites: Completed Classes 1-4

What You'll Learn

By the end of this class, you will:

  • Understand the N+1 problem and why it's critical
  • Implement DataLoader for batch loading
  • Use @BatchMapping in Spring GraphQL
  • Add request-scoped caching
  • Measure and verify performance improvements

The N+1 Problem Explained

Let's say we run this query:

query {
movies {
title
director {
name
}
}
}

With our current implementation, here's what happens:

┌─────────────────────────────────────────────────────────────────────┐
│ THE N+1 PROBLEM │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Query: movies { title, director { name } } │
│ │
│ Database Queries Executed: │
│ ───────────────────────────────────────────────────────────────── │
│ 1. SELECT * FROM movies -- Get 10 movies │
│ 2. SELECT * FROM directors WHERE id = 1 -- For movie 1 │
│ 3. SELECT * FROM directors WHERE id = 2 -- For movie 2 │
│ 4. SELECT * FROM directors WHERE id = 3 -- For movie 3 │
│ 5. SELECT * FROM directors WHERE id = 3 -- For movie 4 │
│ 6. SELECT * FROM directors WHERE id = 4 -- For movie 5 │
│ 7. SELECT * FROM directors WHERE id = 5 -- For movie 6 │
│ 8. SELECT * FROM directors WHERE id = 6 -- For movie 7 │
│ 9. SELECT * FROM directors WHERE id = 3 -- For movie 8 │
│ ... │
│ │
│ Total: 1 + N queries (where N = number of movies) │
│ For 100 movies: 101 queries! │
│ For 1000 movies: 1001 queries! │
│ │
└─────────────────────────────────────────────────────────────────────┘

This kills performance. Each query has network overhead, connection pooling costs, and execution time.

The Solution: Batch Loading

Instead of fetching directors one by one, we should batch them:

┌─────────────────────────────────────────────────────────────────────┐
│ BATCH LOADING SOLUTION │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Query: movies { title, director { name } } │
│ │
│ Database Queries Executed: │
│ ───────────────────────────────────────────────────────────────── │
│ 1. SELECT * FROM movies │
│ 2. SELECT * FROM directors WHERE id IN (1, 2, 3, 4, 5, 6) │
│ │
│ Total: 2 queries (regardless of movie count!) │
│ │
└─────────────────────────────────────────────────────────────────────┘

Step 1: Add Logging to See the Problem

First, let's add logging to visualize the problem:

📁 Update src/main/java/com/example/moviedb/repository/DirectorRepository.java:

package com.example.moviedb.repository;

import com.example.moviedb.model.Director;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Repository;

import java.util.*;

@Repository
public class DirectorRepository {

private static final Logger log = LoggerFactory.getLogger(DirectorRepository.class);
private final List<Director> directors = new ArrayList<>();

public DirectorRepository() {
directors.add(new Director("1", "Frank Darabont", 1959, "French-American"));
directors.add(new Director("2", "Francis Ford Coppola", 1939, "American"));
directors.add(new Director("3", "Christopher Nolan", 1970, "British-American"));
directors.add(new Director("4", "Robert Zemeckis", 1951, "American"));
directors.add(new Director("5", "Lana Wachowski", 1965, "American"));
directors.add(new Director("6", "Ridley Scott", 1937, "British"));
}

public List<Director> findAll() {
log.info("📚 findAll() - fetching all directors");
return new ArrayList<>(directors);
}

public Optional<Director> findById(String id) {
log.info("🔍 findById({}) - single director lookup", id);
return directors.stream()
.filter(d -> d.getId().equals(id))
.findFirst();
}

// NEW: Batch loading method
public Map<String, Director> findByIds(List<String> ids) {
log.info("⚡ findByIds({}) - BATCH loading {} directors", ids, ids.size());
return directors.stream()
.filter(d -> ids.contains(d.getId()))
.collect(java.util.stream.Collectors.toMap(
Director::getId,
d -> d
));
}
}

Now run a query and check your logs. You'll see many findById calls!

Step 2: Implement @BatchMapping

Spring GraphQL provides @BatchMapping to solve this elegantly:

📁 Update src/main/java/com/example/moviedb/controller/MovieController.java:

package com.example.moviedb.controller;

import com.example.moviedb.model.Actor;
import com.example.moviedb.model.Director;
import com.example.moviedb.model.Genre;
import com.example.moviedb.model.Movie;
import com.example.moviedb.repository.ActorRepository;
import com.example.moviedb.repository.DirectorRepository;
import com.example.moviedb.repository.MovieRepository;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.graphql.data.method.annotation.Argument;
import org.springframework.graphql.data.method.annotation.BatchMapping;
import org.springframework.graphql.data.method.annotation.QueryMapping;
import org.springframework.graphql.data.method.annotation.SchemaMapping;
import org.springframework.stereotype.Controller;

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

@Controller
public class MovieController {

private static final Logger log = LoggerFactory.getLogger(MovieController.class);

private final MovieRepository movieRepository;
private final DirectorRepository directorRepository;
private final ActorRepository actorRepository;

public MovieController(MovieRepository movieRepository,
DirectorRepository directorRepository,
ActorRepository actorRepository) {
this.movieRepository = movieRepository;
this.directorRepository = directorRepository;
this.actorRepository = actorRepository;
}

// ===== QUERY MAPPINGS =====

@QueryMapping
public Movie movie(@Argument String id) {
return movieRepository.findById(id).orElse(null);
}

@QueryMapping
public List<Movie> movies(@Argument Genre genre) {
if (genre != null) {
return movieRepository.findByGenre(genre);
}
return movieRepository.findAll();
}

@QueryMapping
public List<Movie> searchMovies(@Argument String title) {
return movieRepository.searchByTitle(title);
}

@QueryMapping
public Director director(@Argument String id) {
return directorRepository.findById(id).orElse(null);
}

@QueryMapping
public List<Director> directors() {
return directorRepository.findAll();
}

@QueryMapping
public Actor actor(@Argument String id) {
return actorRepository.findById(id).orElse(null);
}

@QueryMapping
public List<Actor> actors() {
return actorRepository.findAll();
}

// ===== BATCH MAPPING FOR DIRECTORS =====

@BatchMapping(typeName = "Movie", field = "director")
public Map<Movie, Director> directors(List<Movie> movies) {
log.info("🚀 BatchMapping directors for {} movies", movies.size());

// Collect unique director IDs
List<String> directorIds = movies.stream()
.map(Movie::getDirectorId)
.distinct()
.toList();

// Single batch fetch
Map<String, Director> directorsById = directorRepository.findByIds(directorIds);

// Map each movie to its director
return movies.stream()
.collect(Collectors.toMap(
movie -> movie,
movie -> directorsById.get(movie.getDirectorId())
));
}

// ===== BATCH MAPPING FOR ACTORS =====

@BatchMapping(typeName = "Movie", field = "actors")
public Map<Movie, List<Actor>> actors(List<Movie> movies) {
log.info("🚀 BatchMapping actors for {} movies", movies.size());

// Collect all unique actor IDs
List<String> allActorIds = movies.stream()
.flatMap(m -> m.getActorIds().stream())
.distinct()
.toList();

// Single batch fetch
Map<String, Actor> actorsById = actorRepository.findByIds(allActorIds);

// Map each movie to its list of actors
return movies.stream()
.collect(Collectors.toMap(
movie -> movie,
movie -> movie.getActorIds().stream()
.map(actorsById::get)
.filter(actor -> actor != null)
.toList()
));
}

// ===== FIELD RESOLVERS FOR DIRECTOR =====

@SchemaMapping(typeName = "Director", field = "movies")
public List<Movie> directorMovies(Director director) {
return movieRepository.findByDirectorId(director.getId());
}

// ===== FIELD RESOLVERS FOR ACTOR =====

@SchemaMapping(typeName = "Actor", field = "movies")
public List<Movie> actorMovies(Actor actor) {
return movieRepository.findByActorId(actor.getId());
}
}

Add the batch method to ActorRepository:

public Map<String, Actor> findByIds(List<String> ids) {
log.info("⚡ findByIds({}) - BATCH loading {} actors", ids, ids.size());
return actors.stream()
.filter(a -> ids.contains(a.getId()))
.collect(java.util.stream.Collectors.toMap(
Actor::getId,
a -> a
));
}

Step 3: Test the Improvement

Restart your application and run this query:

query {
movies {
title
director {
name
}
actors {
name
}
}
}

Check your logs! You should now see:

🚀 BatchMapping directors for 8 movies
⚡ findByIds([1, 2, 3, 4, 5, 6]) - BATCH loading 6 directors
🚀 BatchMapping actors for 8 movies
⚡ findByIds([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) - BATCH loading 10 actors

Before: 1 + 8 + (8 × ~2) = ~25 queries After: 1 + 1 + 1 = 3 queries!

How @BatchMapping Works

┌─────────────────────────────────────────────────────────────────────┐
│ @BatchMapping Flow │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Query arrives: movies { title, director { name } } │
│ │
│ 2. movies() resolver returns [Movie1, Movie2, Movie3, ...] │
│ │
│ 3. Spring GraphQL COLLECTS all Movie objects that need "director" │
│ │
│ 4. Calls @BatchMapping directors(List<Movie> movies) ONCE │
│ with ALL movies │
│ │
│ 5. Method returns Map<Movie, Director> │
│ │
│ 6. Spring distributes directors to each movie in the response │
│ │
│ Key: Method is called ONCE per request, not once per movie! │
│ │
└─────────────────────────────────────────────────────────────────────┘

Alternative: Using DataLoader Directly

For more control, you can use DataLoader directly:

📁 src/main/java/com/example/moviedb/dataloader/DirectorBatchLoader.java

package com.example.moviedb.dataloader;

import com.example.moviedb.model.Director;
import com.example.moviedb.repository.DirectorRepository;
import org.dataloader.BatchLoaderEnvironment;
import org.dataloader.MappedBatchLoaderWithContext;
import org.springframework.stereotype.Component;

import java.util.Map;
import java.util.Set;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;

@Component
public class DirectorBatchLoader implements MappedBatchLoaderWithContext<String, Director> {

private final DirectorRepository directorRepository;

public DirectorBatchLoader(DirectorRepository directorRepository) {
this.directorRepository = directorRepository;
}

@Override
public CompletionStage<Map<String, Director>> load(Set<String> keys,
BatchLoaderEnvironment env) {
return CompletableFuture.supplyAsync(() ->
directorRepository.findByIds(keys.stream().toList())
);
}
}

Register it:

📁 src/main/java/com/example/moviedb/config/DataLoaderConfig.java

package com.example.moviedb.config;

import com.example.moviedb.dataloader.DirectorBatchLoader;
import com.example.moviedb.model.Director;
import org.dataloader.DataLoader;
import org.dataloader.DataLoaderOptions;
import org.dataloader.DataLoaderRegistry;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.graphql.execution.DataLoaderRegistrar;

@Configuration
public class DataLoaderConfig {

@Bean
public DataLoaderRegistrar dataLoaderRegistrar(DirectorBatchLoader directorLoader) {
return (registry, context) -> {
DataLoaderOptions options = DataLoaderOptions.newOptions()
.setCachingEnabled(true); // Enable per-request caching

registry.register("directors",
DataLoader.newMappedDataLoader(directorLoader, options));
};
}
}

Use it in a resolver:

@SchemaMapping(typeName = "Movie", field = "director")
public CompletableFuture<Director> director(Movie movie, DataLoader<String, Director> directors) {
return directors.load(movie.getDirectorId());
}
@BatchMapping vs DataLoader

Use @BatchMapping for most cases - it's simpler and handles the Map creation automatically. Use DataLoader directly when you need:

  • More control over caching options
  • Async loading with CompletableFuture
  • Custom batching logic

Handling Null Values

What if a movie has an invalid directorId? Handle it gracefully:

@BatchMapping(typeName = "Movie", field = "director")
public Map<Movie, Director> directors(List<Movie> movies) {
List<String> directorIds = movies.stream()
.map(Movie::getDirectorId)
.distinct()
.toList();

Map<String, Director> directorsById = directorRepository.findByIds(directorIds);

return movies.stream()
.collect(Collectors.toMap(
movie -> movie,
movie -> directorsById.getOrDefault(movie.getDirectorId(), null),
(existing, replacement) -> existing // Handle duplicate keys
));
}

But wait - if the schema says director: Director! (non-null), returning null will cause an error. Options:

  1. Make it nullable in schema: director: Director
  2. Throw an exception: throw new DataFetchingException("Director not found")
  3. Return a placeholder: Return a "Unknown Director" object

Performance Comparison

Let's visualize the improvement:

┌─────────────────────────────────────────────────────────────────────┐
│ QUERY PERFORMANCE COMPARISON │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Query: movies { title, director { name }, actors { name } } │
│ │
│ Movies │ Without Batch │ With Batch │ Reduction │
│ ───────────────────────────────────────────────────────────────── │
│ 10 │ 31 queries │ 3 queries │ 90% fewer │
│ 100 │ 301 queries │ 3 queries │ 99% fewer │
│ 1000 │ 3001 queries │ 3 queries │ 99.9% fewer │
│ │
│ Response Time (simulated): │
│ 10 │ ~50ms │ ~15ms │ 70% faster │
│ 100 │ ~500ms │ ~20ms │ 96% faster │
│ 1000 │ ~5000ms │ ~50ms │ 99% faster │
│ │
└─────────────────────────────────────────────────────────────────────┘

Request-Scoped Caching

DataLoader also provides per-request caching. If the same director is requested multiple times in one query, it's fetched only once:

query {
movies {
title
director { name } # Director 3 (Nolan)
}
movie(id: "4") {
director { name } # Director 3 (Nolan) - CACHED!
}
}

This is automatic with @BatchMapping and DataLoader.

Batch Loading for Director.movies

Let's also optimize the reverse relationship:

@BatchMapping(typeName = "Director", field = "movies")
public Map<Director, List<Movie>> directorMovies(List<Director> directors) {
log.info("🚀 BatchMapping movies for {} directors", directors.size());

List<String> directorIds = directors.stream()
.map(Director::getId)
.toList();

// This method needs to be added to MovieRepository
Map<String, List<Movie>> moviesByDirectorId =
movieRepository.findByDirectorIds(directorIds);

return directors.stream()
.collect(Collectors.toMap(
director -> director,
director -> moviesByDirectorId.getOrDefault(director.getId(), List.of())
));
}

Add to MovieRepository:

public Map<String, List<Movie>> findByDirectorIds(List<String> directorIds) {
log.info("⚡ findByDirectorIds - BATCH loading movies for {} directors", directorIds.size());
return movies.stream()
.filter(m -> directorIds.contains(m.getDirectorId()))
.collect(Collectors.groupingBy(Movie::getDirectorId));
}

Exercises

Exercise 1: Batch Load for Actor.movies

Implement @BatchMapping for the movies field on the Actor type.

Solution
@BatchMapping(typeName = "Actor", field = "movies")
public Map<Actor, List<Movie>> actorMovies(List<Actor> actors) {
List<String> actorIds = actors.stream()
.map(Actor::getId)
.toList();

Map<String, List<Movie>> moviesByActorId =
movieRepository.findByActorIds(actorIds);

return actors.stream()
.collect(Collectors.toMap(
actor -> actor,
actor -> moviesByActorId.getOrDefault(actor.getId(), List.of())
));
}

Add to MovieRepository:

public Map<String, List<Movie>> findByActorIds(List<String> actorIds) {
Map<String, List<Movie>> result = new HashMap<>();
for (Movie movie : movies) {
for (String actorId : movie.getActorIds()) {
if (actorIds.contains(actorId)) {
result.computeIfAbsent(actorId, k -> new ArrayList<>()).add(movie);
}
}
}
return result;
}

Exercise 2: Add Query Logging

Add logging to measure how long each batch load takes:

@BatchMapping(typeName = "Movie", field = "director")
public Map<Movie, Director> directors(List<Movie> movies) {
long start = System.currentTimeMillis();
// ... implementation
log.info("BatchMapping directors took {}ms", System.currentTimeMillis() - start);
return result;
}

Exercise 3: Verify Caching

Test that the same director isn't loaded twice in one request by adding logging and running a query that requests the same director multiple times.

Summary

In this class, you learned:

✅ The N+1 problem and why it devastates performance ✅ Using @BatchMapping to batch database calls ✅ How Spring GraphQL collects and batches field resolutions ✅ Per-request caching with DataLoader ✅ When to use DataLoader directly vs @BatchMapping

What's Next?

In Class 6: Error Handling, we'll learn:

  • Different types of GraphQL errors
  • Custom exception handling
  • Partial errors (some fields fail, others succeed)
  • Validation and meaningful error messages

Because things will go wrong - let's handle it gracefully!