Class 5: DataLoader & The N+1 Problem
Duration: 30 minutes Difficulty: Intermediate Prerequisites: Completed Classes 1-4
What You'll Learn
By the end of this class, you will:
- Understand the N+1 problem and why it's critical
- Implement DataLoader for batch loading
- Use
@BatchMappingin Spring GraphQL - Add request-scoped caching
- Measure and verify performance improvements
The N+1 Problem Explained
Let's say we run this query:
query {
movies {
title
director {
name
}
}
}
With our current implementation, here's what happens:
┌─────────────────────────────────────────────────────────────────────┐
│ THE N+1 PROBLEM │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Query: movies { title, director { name } } │
│ │
│ Database Queries Executed: │
│ ───────────────────────────────────────────────────────────────── │
│ 1. SELECT * FROM movies -- Get 10 movies │
│ 2. SELECT * FROM directors WHERE id = 1 -- For movie 1 │
│ 3. SELECT * FROM directors WHERE id = 2 -- For movie 2 │
│ 4. SELECT * FROM directors WHERE id = 3 -- For movie 3 │
│ 5. SELECT * FROM directors WHERE id = 3 -- For movie 4 │
│ 6. SELECT * FROM directors WHERE id = 4 -- For movie 5 │
│ 7. SELECT * FROM directors WHERE id = 5 -- For movie 6 │
│ 8. SELECT * FROM directors WHERE id = 6 -- For movie 7 │
│ 9. SELECT * FROM directors WHERE id = 3 -- For movie 8 │
│ ... │
│ │
│ Total: 1 + N queries (where N = number of movies) │
│ For 100 movies: 101 queries! │
│ For 1000 movies: 1001 queries! │
│ │
└─────────────────────────────────────────────────────────────────────┘
This kills performance. Each query has network overhead, connection pooling costs, and execution time.
The Solution: Batch Loading
Instead of fetching directors one by one, we should batch them:
┌─────────────────────────────────────────────────────────────────────┐
│ BATCH LOADING SOLUTION │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Query: movies { title, director { name } } │
│ │
│ Database Queries Executed: │
│ ───────────────────────────────────────────────────────────────── │
│ 1. SELECT * FROM movies │
│ 2. SELECT * FROM directors WHERE id IN (1, 2, 3, 4, 5, 6) │
│ │
│ Total: 2 queries (regardless of movie count!) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Step 1: Add Logging to See the Problem
First, let's add logging to visualize the problem:
📁 Update src/main/java/com/example/moviedb/repository/DirectorRepository.java:
package com.example.moviedb.repository;
import com.example.moviedb.model.Director;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Repository;
import java.util.*;
@Repository
public class DirectorRepository {
private static final Logger log = LoggerFactory.getLogger(DirectorRepository.class);
private final List<Director> directors = new ArrayList<>();
public DirectorRepository() {
directors.add(new Director("1", "Frank Darabont", 1959, "French-American"));
directors.add(new Director("2", "Francis Ford Coppola", 1939, "American"));
directors.add(new Director("3", "Christopher Nolan", 1970, "British-American"));
directors.add(new Director("4", "Robert Zemeckis", 1951, "American"));
directors.add(new Director("5", "Lana Wachowski", 1965, "American"));
directors.add(new Director("6", "Ridley Scott", 1937, "British"));
}
public List<Director> findAll() {
log.info("📚 findAll() - fetching all directors");
return new ArrayList<>(directors);
}
public Optional<Director> findById(String id) {
log.info("🔍 findById({}) - single director lookup", id);
return directors.stream()
.filter(d -> d.getId().equals(id))
.findFirst();
}
// NEW: Batch loading method
public Map<String, Director> findByIds(List<String> ids) {
log.info("⚡ findByIds({}) - BATCH loading {} directors", ids, ids.size());
return directors.stream()
.filter(d -> ids.contains(d.getId()))
.collect(java.util.stream.Collectors.toMap(
Director::getId,
d -> d
));
}
}
Now run a query and check your logs. You'll see many findById calls!
Step 2: Implement @BatchMapping
Spring GraphQL provides @BatchMapping to solve this elegantly:
📁 Update src/main/java/com/example/moviedb/controller/MovieController.java:
package com.example.moviedb.controller;
import com.example.moviedb.model.Actor;
import com.example.moviedb.model.Director;
import com.example.moviedb.model.Genre;
import com.example.moviedb.model.Movie;
import com.example.moviedb.repository.ActorRepository;
import com.example.moviedb.repository.DirectorRepository;
import com.example.moviedb.repository.MovieRepository;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.graphql.data.method.annotation.Argument;
import org.springframework.graphql.data.method.annotation.BatchMapping;
import org.springframework.graphql.data.method.annotation.QueryMapping;
import org.springframework.graphql.data.method.annotation.SchemaMapping;
import org.springframework.stereotype.Controller;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
@Controller
public class MovieController {
private static final Logger log = LoggerFactory.getLogger(MovieController.class);
private final MovieRepository movieRepository;
private final DirectorRepository directorRepository;
private final ActorRepository actorRepository;
public MovieController(MovieRepository movieRepository,
DirectorRepository directorRepository,
ActorRepository actorRepository) {
this.movieRepository = movieRepository;
this.directorRepository = directorRepository;
this.actorRepository = actorRepository;
}
// ===== QUERY MAPPINGS =====
@QueryMapping
public Movie movie(@Argument String id) {
return movieRepository.findById(id).orElse(null);
}
@QueryMapping
public List<Movie> movies(@Argument Genre genre) {
if (genre != null) {
return movieRepository.findByGenre(genre);
}
return movieRepository.findAll();
}
@QueryMapping
public List<Movie> searchMovies(@Argument String title) {
return movieRepository.searchByTitle(title);
}
@QueryMapping
public Director director(@Argument String id) {
return directorRepository.findById(id).orElse(null);
}
@QueryMapping
public List<Director> directors() {
return directorRepository.findAll();
}
@QueryMapping
public Actor actor(@Argument String id) {
return actorRepository.findById(id).orElse(null);
}
@QueryMapping
public List<Actor> actors() {
return actorRepository.findAll();
}
// ===== BATCH MAPPING FOR DIRECTORS =====
@BatchMapping(typeName = "Movie", field = "director")
public Map<Movie, Director> directors(List<Movie> movies) {
log.info("🚀 BatchMapping directors for {} movies", movies.size());
// Collect unique director IDs
List<String> directorIds = movies.stream()
.map(Movie::getDirectorId)
.distinct()
.toList();
// Single batch fetch
Map<String, Director> directorsById = directorRepository.findByIds(directorIds);
// Map each movie to its director
return movies.stream()
.collect(Collectors.toMap(
movie -> movie,
movie -> directorsById.get(movie.getDirectorId())
));
}
// ===== BATCH MAPPING FOR ACTORS =====
@BatchMapping(typeName = "Movie", field = "actors")
public Map<Movie, List<Actor>> actors(List<Movie> movies) {
log.info("🚀 BatchMapping actors for {} movies", movies.size());
// Collect all unique actor IDs
List<String> allActorIds = movies.stream()
.flatMap(m -> m.getActorIds().stream())
.distinct()
.toList();
// Single batch fetch
Map<String, Actor> actorsById = actorRepository.findByIds(allActorIds);
// Map each movie to its list of actors
return movies.stream()
.collect(Collectors.toMap(
movie -> movie,
movie -> movie.getActorIds().stream()
.map(actorsById::get)
.filter(actor -> actor != null)
.toList()
));
}
// ===== FIELD RESOLVERS FOR DIRECTOR =====
@SchemaMapping(typeName = "Director", field = "movies")
public List<Movie> directorMovies(Director director) {
return movieRepository.findByDirectorId(director.getId());
}
// ===== FIELD RESOLVERS FOR ACTOR =====
@SchemaMapping(typeName = "Actor", field = "movies")
public List<Movie> actorMovies(Actor actor) {
return movieRepository.findByActorId(actor.getId());
}
}
Add the batch method to ActorRepository:
public Map<String, Actor> findByIds(List<String> ids) {
log.info("⚡ findByIds({}) - BATCH loading {} actors", ids, ids.size());
return actors.stream()
.filter(a -> ids.contains(a.getId()))
.collect(java.util.stream.Collectors.toMap(
Actor::getId,
a -> a
));
}
Step 3: Test the Improvement
Restart your application and run this query:
query {
movies {
title
director {
name
}
actors {
name
}
}
}
Check your logs! You should now see:
🚀 BatchMapping directors for 8 movies
⚡ findByIds([1, 2, 3, 4, 5, 6]) - BATCH loading 6 directors
🚀 BatchMapping actors for 8 movies
⚡ findByIds([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) - BATCH loading 10 actors
Before: 1 + 8 + (8 × ~2) = ~25 queries After: 1 + 1 + 1 = 3 queries!
How @BatchMapping Works
┌─────────────────────────────────────────────────────────────────────┐
│ @BatchMapping Flow │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Query arrives: movies { title, director { name } } │
│ │
│ 2. movies() resolver returns [Movie1, Movie2, Movie3, ...] │
│ │
│ 3. Spring GraphQL COLLECTS all Movie objects that need "director" │
│ │
│ 4. Calls @BatchMapping directors(List<Movie> movies) ONCE │
│ with ALL movies │
│ │
│ 5. Method returns Map<Movie, Director> │
│ │
│ 6. Spring distributes directors to each movie in the response │
│ │
│ Key: Method is called ONCE per request, not once per movie! │
│ │
└─────────────────────────────────────────────────────────────────────┘
Alternative: Using DataLoader Directly
For more control, you can use DataLoader directly:
📁 src/main/java/com/example/moviedb/dataloader/DirectorBatchLoader.java
package com.example.moviedb.dataloader;
import com.example.moviedb.model.Director;
import com.example.moviedb.repository.DirectorRepository;
import org.dataloader.BatchLoaderEnvironment;
import org.dataloader.MappedBatchLoaderWithContext;
import org.springframework.stereotype.Component;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;
@Component
public class DirectorBatchLoader implements MappedBatchLoaderWithContext<String, Director> {
private final DirectorRepository directorRepository;
public DirectorBatchLoader(DirectorRepository directorRepository) {
this.directorRepository = directorRepository;
}
@Override
public CompletionStage<Map<String, Director>> load(Set<String> keys,
BatchLoaderEnvironment env) {
return CompletableFuture.supplyAsync(() ->
directorRepository.findByIds(keys.stream().toList())
);
}
}
Register it:
📁 src/main/java/com/example/moviedb/config/DataLoaderConfig.java
package com.example.moviedb.config;
import com.example.moviedb.dataloader.DirectorBatchLoader;
import com.example.moviedb.model.Director;
import org.dataloader.DataLoader;
import org.dataloader.DataLoaderOptions;
import org.dataloader.DataLoaderRegistry;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.graphql.execution.DataLoaderRegistrar;
@Configuration
public class DataLoaderConfig {
@Bean
public DataLoaderRegistrar dataLoaderRegistrar(DirectorBatchLoader directorLoader) {
return (registry, context) -> {
DataLoaderOptions options = DataLoaderOptions.newOptions()
.setCachingEnabled(true); // Enable per-request caching
registry.register("directors",
DataLoader.newMappedDataLoader(directorLoader, options));
};
}
}
Use it in a resolver:
@SchemaMapping(typeName = "Movie", field = "director")
public CompletableFuture<Director> director(Movie movie, DataLoader<String, Director> directors) {
return directors.load(movie.getDirectorId());
}
Use @BatchMapping for most cases - it's simpler and handles the Map creation automatically.
Use DataLoader directly when you need:
- More control over caching options
- Async loading with CompletableFuture
- Custom batching logic
Handling Null Values
What if a movie has an invalid directorId? Handle it gracefully:
@BatchMapping(typeName = "Movie", field = "director")
public Map<Movie, Director> directors(List<Movie> movies) {
List<String> directorIds = movies.stream()
.map(Movie::getDirectorId)
.distinct()
.toList();
Map<String, Director> directorsById = directorRepository.findByIds(directorIds);
return movies.stream()
.collect(Collectors.toMap(
movie -> movie,
movie -> directorsById.getOrDefault(movie.getDirectorId(), null),
(existing, replacement) -> existing // Handle duplicate keys
));
}
But wait - if the schema says director: Director! (non-null), returning null will cause an error. Options:
- Make it nullable in schema:
director: Director - Throw an exception:
throw new DataFetchingException("Director not found") - Return a placeholder: Return a "Unknown Director" object
Performance Comparison
Let's visualize the improvement:
┌─────────────────────────────────────────────────────────────────────┐
│ QUERY PERFORMANCE COMPARISON │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Query: movies { title, director { name }, actors { name } } │
│ │
│ Movies │ Without Batch │ With Batch │ Reduction │
│ ───────────────────────────────────────────────────────────────── │
│ 10 │ 31 queries │ 3 queries │ 90% fewer │
│ 100 │ 301 queries │ 3 queries │ 99% fewer │
│ 1000 │ 3001 queries │ 3 queries │ 99.9% fewer │
│ │
│ Response Time (simulated): │
│ 10 │ ~50ms │ ~15ms │ 70% faster │
│ 100 │ ~500ms │ ~20ms │ 96% faster │
│ 1000 │ ~5000ms │ ~50ms │ 99% faster │
│ │
└─────────────────────────────────────────────────────────────────────┘
Request-Scoped Caching
DataLoader also provides per-request caching. If the same director is requested multiple times in one query, it's fetched only once:
query {
movies {
title
director { name } # Director 3 (Nolan)
}
movie(id: "4") {
director { name } # Director 3 (Nolan) - CACHED!
}
}
This is automatic with @BatchMapping and DataLoader.
Batch Loading for Director.movies
Let's also optimize the reverse relationship:
@BatchMapping(typeName = "Director", field = "movies")
public Map<Director, List<Movie>> directorMovies(List<Director> directors) {
log.info("🚀 BatchMapping movies for {} directors", directors.size());
List<String> directorIds = directors.stream()
.map(Director::getId)
.toList();
// This method needs to be added to MovieRepository
Map<String, List<Movie>> moviesByDirectorId =
movieRepository.findByDirectorIds(directorIds);
return directors.stream()
.collect(Collectors.toMap(
director -> director,
director -> moviesByDirectorId.getOrDefault(director.getId(), List.of())
));
}
Add to MovieRepository:
public Map<String, List<Movie>> findByDirectorIds(List<String> directorIds) {
log.info("⚡ findByDirectorIds - BATCH loading movies for {} directors", directorIds.size());
return movies.stream()
.filter(m -> directorIds.contains(m.getDirectorId()))
.collect(Collectors.groupingBy(Movie::getDirectorId));
}
Exercises
Exercise 1: Batch Load for Actor.movies
Implement @BatchMapping for the movies field on the Actor type.
Solution
@BatchMapping(typeName = "Actor", field = "movies")
public Map<Actor, List<Movie>> actorMovies(List<Actor> actors) {
List<String> actorIds = actors.stream()
.map(Actor::getId)
.toList();
Map<String, List<Movie>> moviesByActorId =
movieRepository.findByActorIds(actorIds);
return actors.stream()
.collect(Collectors.toMap(
actor -> actor,
actor -> moviesByActorId.getOrDefault(actor.getId(), List.of())
));
}
Add to MovieRepository:
public Map<String, List<Movie>> findByActorIds(List<String> actorIds) {
Map<String, List<Movie>> result = new HashMap<>();
for (Movie movie : movies) {
for (String actorId : movie.getActorIds()) {
if (actorIds.contains(actorId)) {
result.computeIfAbsent(actorId, k -> new ArrayList<>()).add(movie);
}
}
}
return result;
}
Exercise 2: Add Query Logging
Add logging to measure how long each batch load takes:
@BatchMapping(typeName = "Movie", field = "director")
public Map<Movie, Director> directors(List<Movie> movies) {
long start = System.currentTimeMillis();
// ... implementation
log.info("BatchMapping directors took {}ms", System.currentTimeMillis() - start);
return result;
}
Exercise 3: Verify Caching
Test that the same director isn't loaded twice in one request by adding logging and running a query that requests the same director multiple times.
Summary
In this class, you learned:
✅ The N+1 problem and why it devastates performance
✅ Using @BatchMapping to batch database calls
✅ How Spring GraphQL collects and batches field resolutions
✅ Per-request caching with DataLoader
✅ When to use DataLoader directly vs @BatchMapping
What's Next?
In Class 6: Error Handling, we'll learn:
- Different types of GraphQL errors
- Custom exception handling
- Partial errors (some fields fail, others succeed)
- Validation and meaningful error messages
Because things will go wrong - let's handle it gracefully!