Back to Blog
Performance

The Real Challenges of Scaling Next.js (And When They Actually Matter)

Next.js makes assumptions about running as a single instance. When you need multiple replicas, those assumptions break in specific, predictable ways. Here's what actually goes wrong and how to fix it.

Morley Media Team2/5/20269 min read

The Real Challenges of Scaling Next.js (And When They Actually Matter)

A well-built Next.js app on a single server can handle far more traffic than most startups will ever see. The real scaling challenges appear when you need to run multiple instances of your application, and they're specific, predictable, and solvable.

The core issue is that Next.js assumes it's running as a single instance. Its caching, image optimization, and file storage are all filesystem-based by default. That works perfectly on one server. The moment you deploy two replicas behind a load balancer, those assumptions break.

What Actually Breaks With Multiple Instances

The Filesystem Cache Problem

Next.js stores its cache on the local filesystem in .next/cache. This includes ISR (Incremental Static Regeneration) pages, fetch cache results, and optimized images. When you're running a single instance, this is fast and efficient — the cache builds up over time and serves content without hitting your database or APIs.

When you add a second instance, each one has its own independent cache. User A hits replica 1, which has a warm cache and responds in 50ms. User B hits replica 2, which has a cold cache and needs to regenerate the page from scratch, taking 800ms. Both users are on the same page, getting wildly different performance.

It gets worse. When an ISR page revalidates on replica 1, replica 2 doesn't know about it. It continues serving stale content until its own revalidation timer fires or a user triggers it. In the meantime, your users see inconsistent data depending on which replica they hit.

// This ISR configuration works perfectly on one instance
export const revalidate = 3600 // Regenerate every hour
 
export default async function ProductPage({ params }: { params: { id: string } }) {
  const product = await getProduct(params.id)
  return <ProductDetail product={product} />
}
 
// But with 3 replicas, the page may regenerate 3 times independently,
// each hitting your database, and each potentially showing different data
// during the revalidation window.

The solution is to move the cache out of the filesystem and into a shared store like Redis. Next.js supports custom cache handlers for this purpose. The implementation is straightforward but requires understanding what you're replacing.

// next.config.ts — point to a custom cache handler
const nextConfig = {
  cacheHandler: require.resolve("./cache-handler.mjs"),
  cacheMaxMemorySize: 0, // Disable in-memory cache, use Redis only
};
// cache-handler.mjs — Redis-based shared cache
import Redis from "ioredis";
 
const redis = new Redis(process.env.REDIS_URL);
 
export default class CacheHandler {
  async get(key) {
    const data = await redis.get(key);
    return data ? JSON.parse(data) : null;
  }
 
  async set(key, data, ctx) {
    const ttl = ctx.revalidate || 3600;
    await redis.setex(key, ttl, JSON.stringify(data));
  }
 
  async revalidateTag(tags) {
    // Invalidate all cache entries matching these tags
    // This propagates across all replicas instantly
    for (const tag of tags) {
      const keys = await redis.keys(`*tag:${tag}*`);
      if (keys.length > 0) await redis.del(...keys);
    }
  }
}
⚠️

Shared filesystems (NFS, EFS) might seem like a simpler solution, but they introduce file locking issues, race conditions, and data corruption under concurrent writes. Redis is the correct answer here — it's designed for exactly this kind of concurrent access pattern.

Image Optimization Duplication

Next.js's built-in image optimizer uses Sharp to resize and convert images on demand, then caches the results to the filesystem. With multiple replicas, each one processes and caches images independently. If you have 3 replicas and a page with 10 images, you might process the same 10 images 3 times before all caches are warm.

For small sites this is a non-issue. For image-heavy applications like e-commerce, this wastes significant CPU and memory. The options are:

  • Use an external image service (Cloudinary, ImageKit, Imgix) and bypass Next.js image optimization entirely
  • Self-host an image proxy like IPX with its own persistent cache
  • Accept the duplication cost if your image catalog is small enough

Server Actions Break During Rolling Deployments

This one is subtle and catches people off guard. Next.js encrypts Server Action identifiers at build time. During a rolling deployment — where old and new instances run simultaneously — a Server Action initiated on the old build can land on a new instance that can't decrypt it. The user sees an error.

The fix is setting a consistent encryption key across builds:

# Set this in your environment variables
NEXT_SERVER_ACTIONS_ENCRYPTION_KEY=your-consistent-key-here

This ensures both old and new instances can decrypt each other's Server Action payloads during the transition period.

Streaming and Reverse Proxy Buffering

If you're using React Suspense, streaming SSR, or loading.tsx, your reverse proxy might be silently breaking it. Nginx and similar proxies buffer responses by default, which means they wait for the entire response before sending anything to the client. This defeats the entire purpose of streaming.

# In your Nginx config — disable buffering for Next.js
location / {
    proxy_pass http://nextjs_upstream;
    proxy_buffering off;
    proxy_set_header X-Accel-Buffering no;
}

Without this, users see a blank page that suddenly loads all at once instead of the progressive loading experience Suspense is designed to provide.

The Problems That Don't Require Multiple Instances

Not every scaling issue is about horizontal scaling. These are common bottlenecks that affect single-instance deployments too.

Database Connection Management

This is the most common mistake in Next.js applications, and it has nothing to do with the number of replicas. In development, Next.js's hot module reloading creates a new database client on every file change, eventually exhausting your connection pool.

// The standard fix — reuse the client across hot reloads
import { PrismaClient } from "@prisma/client";
 
const globalForPrisma = globalThis as unknown as {
  prisma: PrismaClient | undefined;
};
 
export const prisma = globalForPrisma.prisma ?? new PrismaClient();
 
if (process.env.NODE_ENV !== "production") globalForPrisma.prisma = prisma;

In production, the concern is different: how many connections your database can handle. A single Next.js instance with 20 concurrent requests might try to open 20 database connections. If your database plan supports 25 connections, you're already close to the limit before you add a second instance.

Connection pooling (through PgBouncer, Prisma Accelerate, or Supabase's built-in pooler) lets you serve hundreds of concurrent requests through a smaller number of actual database connections. This is usually the first infrastructure addition you need, well before you need multiple app instances.

Bundle Size

This one is straightforward but often ignored. Every dependency you add to a client component increases the JavaScript your users download. A 2MB bundle doesn't break at scale — it's slow for everyone from day one. The difference is that at scale, more users are affected and more users bounce.

Use next/dynamic for heavy components that aren't needed on initial load. Check your bundle with @next/bundle-analyzer. Be ruthless about what runs on the client vs the server.

When Do You Actually Need to Scale Horizontally?

Here's the honest answer: later than you think. A single well-configured server can handle a surprising amount of traffic. A Node.js process serving Next.js pages can typically handle hundreds of concurrent requests before response times degrade meaningfully.

The signals that you actually need multiple instances:

  • CPU is consistently above 70-80% during normal traffic (not just during builds or spikes)
  • Response times are degrading and you've already optimized your queries, caching, and rendering strategy
  • You need zero-downtime deployments and can't afford the brief interruption of a single-instance restart
  • Geographic distribution — your users are spread across continents and need lower latency

If you're not seeing these signals, focus your effort on the single-instance optimizations first: caching strategy, database query optimization, proper use of static generation vs server rendering, and bundle size management.

💡

If you're deploying to Vercel, most of these multi-instance challenges are handled for you. Vercel abstracts the caching layer, image optimization, and deployment coordination. The tradeoff is cost and control. If you need to self-host for budget, compliance, or architectural reasons, the challenges above are what you'll need to solve yourself.

A Practical Scaling Roadmap

Phase 1: Optimize the Single Instance

Before adding complexity, make sure you're getting the most out of what you have.

  • Audit your rendering strategy. Pages that don't change per-request should be statically generated or use ISR, not server-rendered on every hit.
  • Add a caching layer. Even a simple in-memory cache (like lru-cache) for expensive database queries can dramatically reduce load.
  • Optimize database queries. Use select to fetch only the fields you need. Add indexes for your most common queries. Use connection pooling.
  • Check your bundle size. Remove unused dependencies. Lazy-load heavy components.

Phase 2: Add Infrastructure

When the single instance isn't enough:

  • Redis for shared caching (ISR cache, session data, rate limiting)
  • Connection pooler (PgBouncer or equivalent) to manage database connections across instances
  • CDN in front of your application for static assets and cacheable responses
  • Health checks so your load balancer can detect and route around unhealthy instances

Phase 3: Multiple Instances

  • Configure the custom cache handler to use Redis
  • Set NEXT_SERVER_ACTIONS_ENCRYPTION_KEY for consistent deployments
  • Disable reverse proxy buffering if using streaming
  • Consider an external image optimization service
  • Set up proper monitoring per-instance (CPU, memory, response times, error rates)

The Monitoring You Need

You can't optimize what you can't measure. At minimum, track:

  • Server-side: Response times per route, CPU and memory usage, database query duration, cache hit rates
  • Client-side: Core Web Vitals (LCP, CLS, INP), JavaScript bundle load times, hydration duration
  • Infrastructure: Connection pool utilization, Redis memory usage, error rates per instance

Tools like Grafana + Prometheus (self-hosted) or Vercel Analytics + Datadog (managed) cover these. The important thing is having visibility before you need to debug a problem under pressure.

Hitting real scaling challenges? Our team has built and scaled Next.js applications from prototype to production. We can audit your architecture, identify the actual bottlenecks (not the theoretical ones), and implement the right solutions for your stage. Get in touch.

Get Expert Help

Further Reading

Tags

Next.jsScalingPerformanceArchitectureSelf-Hosting

Need Help With Your Project?

Our expert development team can help you implement these solutions and scale your application. Get professional guidance tailored to your specific needs.