Concept
Reliability

Bulkheads and rate limiting

Isolating failures, capping load.

A bulkhead is a wall inside a ship that stops a flood in one compartment from sinking the rest. In software, a bulkhead is the same idea: isolate resources so a failure in one place cannot consume the resources that other places need to keep working. Rate limiting is its first cousin: cap the inputs so no single caller can monopolise the system. Both exist for the same reason. Without them, one bad actor (an angry user, a runaway script, a slow downstream) takes everything down with it.

The problem they solve

A web app talks to three downstreams: payment, search, recommendations. They share one thread pool of 50 threads.

flowchart LR
    U(["Users"]):::client
    API[["API<br/>shared pool: 50 threads"]]:::infra
    P[("Payment<br/>fast")]:::server
    S[("Search<br/>fast")]:::server
    R[("Recommendations<br/>SLOW today")]:::dead

    U ==> API
    API -.->|"few threads"| P
    API -.->|"few threads"| S
    API ==>|"all 50 threads stuck here"| R

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
    classDef dead fill:#fecaca,stroke:#b91c1c,color:#7f1d1d,stroke-width:1.5px

Recommendations is slow. Threads pile up there. Payment requests cannot get a thread. The slow downstream took down the unrelated, healthy downstream’s path. The system is “up”, but checkout is broken because of a recommender that nobody is even looking at.

Bulkheads: separate pools per downstream

Give each downstream its own thread budget. A slow downstream consumes its own quota and stops there.

flowchart LR
    U(["Users"]):::client
    API[["API"]]:::infra

    PP[["Payment pool<br/>20 threads"]]:::server
    SP[["Search pool<br/>20 threads"]]:::server
    RP[["Recommend pool<br/>10 threads"]]:::dead

    P[("Payment")]:::server
    S[("Search")]:::server
    R[("Recommendations<br/>SLOW")]:::dead

    U ==> API
    API ==> PP ==> P
    API ==> SP ==> S
    API ==> RP ==> R

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
    classDef dead fill:#fecaca,stroke:#b91c1c,color:#7f1d1d,stroke-width:1.5px

Recommendations is still slow. Its 10 threads still pile up. But payment has its own 20 threads, untouched. Checkout still works. The slow path is the only thing that hurts; everything else is fine.

This is what a bulkhead buys you: failure containment. The blast radius of any one downstream’s bad day is now bounded.

Bulkheads in practice

Bulkheads can be coarse or fine:

  • Per downstream. The pattern above. Each external dependency gets its own thread pool, connection pool, or semaphore.
  • Per tenant. In multi-tenant systems, a per-tenant quota prevents one noisy customer from starving the others.
  • Per process. Separate processes (or containers, or pods) per concern. A leak in one cannot eat the memory the others need.

The trade-off is wasted capacity: pools that are sized for the worst case sit underutilised most of the time. The trade is worth it. Underutilised capacity is cheap; cascading outages are not.

Rate limiting: cap the inputs

A bulkhead protects you from a slow downstream. A rate limit protects you from an aggressive caller (a script gone wild, a misbehaving partner, a DDoS attempt, or just one user accidentally generating ten thousand requests per second). The shape is “this caller may make at most N requests per T time.”

flowchart LR
    U1(["normal user"]):::client
    U2(["misbehaving script<br/>5,000 RPS"]):::dead
    U3(["another user"]):::client

    RL[["Rate limiter<br/>100 RPS per user"]]:::infra

    API[("API")]:::server

    U1 ==> RL
    U2 ==> RL
    U3 ==> RL

    RL ==>|"under limit, forwarded"| API
    RL -.->|"over limit, rejected with 429"| API

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
    classDef dead fill:#fecaca,stroke:#b91c1c,color:#7f1d1d,stroke-width:1.5px

The misbehaving caller gets 429 Too Many Requests. Everyone else carries on normally. The detailed mechanics (token bucket, leaky bucket, fixed window, sliding window) are covered in their own concept page; see Rate limiting strategies.

How bulkheads and rate limiting compose

RiskTool
Slow downstream eats threadsBulkhead (separate pools)
One caller floods youRate limiting (per caller)
One tenant hurts othersTenant bulkhead + per-tenant rate limit
Sustained downstream failureCircuit breaker
Transient downstream blipRetry with backoff and jitter

Production systems use all of them together. A reliable service typically has: rate limiting at the edge, per-downstream bulkheads inside, circuit breakers around each downstream, and retries-with-backoff for transient errors.

Two scenarios

Scenario one: a product page that calls three services.

The page calls inventory (must), reviews (nice to have), and recommendations (nice to have). Without bulkheads, a slow recommender starves inventory and the page errors. With bulkheads, recommendations has 5 threads; even if it hangs, the inventory call has its own 15 threads and the page renders with a blank recommendations slot. Add graceful degradation, and the user does not even notice. See Graceful degradation.

Scenario two: a public API used by partners.

One partner deploys a bug that retries every request 50 times. Without rate limiting, this one partner consumes all your capacity and other partners see errors. With per-partner rate limits at the API gateway, the bad partner gets 429s as soon as they exceed their quota; the rest of the world keeps working. The bad partner notices their own problem; you do not have to.

What this connects to

Common mistakes

  • No bulkheads at all. A slow non-critical downstream takes the whole service down. The classic incident.
  • Shared connection pool for all downstreams. Same problem. The database connection pool gets exhausted by one slow query and everything that needs the database errors.
  • Rate limits only at the load balancer. Per-IP at the edge does not help if many users behind a corporate NAT share an IP. Layer rate limits: edge plus per-user-id inside.
  • No way to lift the limit. Sometimes you want a partner to run a one-off batch import. If your rate limit cannot be bumped for them, you are creating a support ticket every time.
  • Sizing bulkheads by guess. Measure the steady-state and peak concurrency per downstream and size from there.
  • Rejecting silently. Always return 429 with a Retry-After header so clients can back off intelligently.

Quick recap

  • Bulkheads: separate resource pools so one bad downstream does not starve the others.
  • Rate limits: cap inputs so no one caller can monopolise the system.
  • Combine with circuit breakers (stop calling the broken thing) and graceful degradation (render without it).
  • Underutilised capacity is the price of failure containment; cheap compared to cascading outages.
  • The reliable production stack: edge rate limit + per-downstream bulkhead + per-downstream breaker + retry-with-backoff. Default it on.

This concept sits in Stage 4 (Scaling and reliability) of the System Design Roadmap.