Load balancing algorithms
Round robin, least connections, consistent hash.
A load balancer’s algorithm is the rule it uses to pick a backend for each incoming request. Pick the wrong one and you can have a “load balancer” that sends 80% of traffic to one backend while the others sit idle, or one that scrambles a user’s session every 30 seconds, or one that has to rebalance the entire pool every time a single backend dies. The four algorithms below cover almost every real production choice.
Round robin: take turns
Send request 1 to backend A, request 2 to B, request 3 to C, request 4 back to A. Cycle forever.
sequenceDiagram
autonumber
participant LB as Load balancer
participant A as Backend A
participant B as Backend B
participant C as Backend C
Note over LB: Round-robin cursor starts at A
LB->>A: request 1
LB->>B: request 2
LB->>C: request 3
LB->>A: request 4
LB->>B: request 5
LB->>C: request 6
Note over LB,C: Each backend gets ~1/N of requests, regardless of effort.
Strength. Stateless, trivial, no per-backend bookkeeping needed.
Weakness. Assumes every request costs the same and every backend is equally fast. In reality, requests vary by 1000x in cost (a thumbnail vs a search) and backends drift (one box has a hot cache, one is GC’ing). Round robin keeps sending the slow box its share anyway.
When to use: low-variance workloads, identical backends, simplest possible setup.
Weighted round robin: take turns, with bias
Same idea, but the LB skips lower-weight backends more often. Useful when backends have different capacity (e.g., during a rolling upgrade with bigger machines online).
sequenceDiagram
autonumber
participant LB as Load balancer
participant A as Backend A (weight 3)
participant B as Backend B (weight 1)
participant C as Backend C (weight 1)
LB->>A: request 1
LB->>A: request 2
LB->>A: request 3
LB->>B: request 4
LB->>C: request 5
LB->>A: request 6
LB->>A: request 7
LB->>A: request 8
LB->>B: request 9
LB->>C: request 10
A gets three times as many requests as B or C. Same trade-offs as plain round robin; a small upgrade to take backend size into account.
Least connections: send to the least busy
The LB tracks how many active requests each backend currently has and sends the new one to whichever has the fewest. Reacts to real load instead of assuming uniformity.
flowchart LR
LB[["Load balancer<br/>tracks active count per backend"]]:::infra
A[("Backend A<br/>40 active")]:::server
B[("Backend B<br/>12 active ← winner")]:::server
C[("Backend C<br/>28 active")]:::server
REQ([New request]):::client
REQ ==> LB
LB ==>|"pick the smallest count"| B
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
Strength. Adapts automatically to slow backends, GC pauses, and uneven request costs. The single best default for variable workloads.
Weakness. Needs per-backend state on the LB. New backends look “empty” briefly and get hammered until they catch up. (Power-of-two-choices is a clever fix: pick two backends at random, then pick the less loaded of those two. Same effect, much less bookkeeping.)
When to use: most variable, real-world web workloads. This is the right default for L7 LBs in front of HTTP services.
Consistent hashing: same key, same backend
Hash a key from the request (often the user ID, session ID, or full URL) onto a circular ring. Hash each backend onto the same ring. Each request goes to the next backend clockwise from its hash.
flowchart TB
subgraph RING["Consistent hash ring (12 o'clock = 0, clockwise)"]
direction LR
A[("Backend A<br/>at hash 1500")]:::server
B[("Backend B<br/>at hash 4500")]:::server
C[("Backend C<br/>at hash 8000")]:::server
K1["key 'user:42' → hash 3200<br/>next clockwise = B"]:::client
K2["key 'user:99' → hash 7800<br/>next clockwise = C"]:::client
K3["key 'product:88' → hash 1100<br/>next clockwise = A"]:::client
A --- B --- C --- A
end
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
Strength. Same key always goes to the same backend, with no central state. The killer feature is what happens when a backend is added or removed: only the keys in that arc move, not the whole pool.
flowchart TB
subgraph BEFORE["Before adding backend D"]
direction LR
A1[("A")]:::server
B1[("B")]:::server
C1[("C")]:::server
A1 --- B1 --- C1 --- A1
end
subgraph AFTER["After adding D (only D's arc moves)"]
direction LR
A2[("A")]:::server
D2[("D — new")]:::infra
B2[("B")]:::server
C2[("C")]:::server
A2 --- D2 --- B2 --- C2 --- A2
end
BEFORE --> AFTER
classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
With a naive hash mod N, going from 3 backends to 4 reshuffles about 75% of all keys. With consistent hashing, only 1/4 of keys move. This is why every modern distributed system uses consistent hashing for sharding and cache routing.
In practice, each backend hashes to many points on the ring (virtual nodes), so the distribution is more even and the impact of adding/removing a backend is smoothed across the ring.
Weakness. Uneven distribution if the hash points are not well-spread (mitigated by virtual nodes). Hot keys still go to one backend; the algorithm balances uniformly across keys, not across actual work.
When to use: caches where the same key should hit the same backend, sticky routing without server-side state, sharded systems.
Picking an algorithm
flowchart TB
Q1{"Do all requests cost roughly the same<br/>and do backends behave identically?"}:::query
Q2{"Do request costs vary a lot,<br/>or do backends differ?"}:::query
Q3{"Do you need the same key<br/>to land on the same backend?"}:::query
Q4{"Do backends have different capacity<br/>during a rollout or transition?"}:::query
A1["Round robin.<br/>Simplest possible. Fine for uniform pools."]:::strong
A2["Least connections (or power-of-two-choices).<br/>The default for real-world HTTP."]:::strong
A3["Consistent hashing.<br/>Sharded caches, session affinity, partitioned services."]:::strong
A4["Weighted round robin.<br/>Temporary or permanent capacity asymmetry."]:::mid
Q1 -->|"yes"| A1
Q1 -->|"no"| Q2
Q2 -->|"yes"| A2
Q2 -->|"no"| Q3
Q3 -->|"yes"| A3
Q3 -->|"no"| Q4
Q4 -->|"yes"| A4
classDef query fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
classDef mid fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
For 80% of HTTP applications, least connections (or its modern variant, power-of-two-choices) is the right default. Reach for consistent hashing when you need affinity. Round robin is fine for simple pools but rarely the best choice in production.
Two scenarios
Scenario one: an API behind an L7 ALB.
Mixed request costs (quick GETs alongside slow searches). Backends sometimes pause for GC. Pick least connections so the LB always sends new requests to the box that is least busy right now. Hit rate of slow backends stays low.
Scenario two: a Memcached cluster.
Five cache nodes. You want user:42 to always hash to the same node so the cache hit rate stays high. Consistent hashing. When you add a sixth node, only ~1/6 of keys move; the rest still hit warm caches.
What this connects to
- Load balancer basics. Algorithm choice is part of LB configuration. See Load balancer: why, how, when.
- L4 vs L7. L4 can do round robin and consistent hashing on the 5-tuple; L7 can do all four, plus header-aware variants. See L4 vs L7 load balancing.
- Sticky sessions. Sometimes implemented via consistent hashing of a session cookie. See Sticky sessions.
- Sharding strategies. Consistent hashing is also the standard for sharding databases and caches. See Sharding strategies.
Common mistakes
- Round robin on variable workloads. A slow box and a fast box get the same number of requests; p99 latency suffers because the slow one piles up.
- Least connections without a warmup grace period. A freshly-added backend has zero active connections, so the LB dumps everything on it. Add a slow-start period that ramps it up.
- Consistent hashing without virtual nodes. Three real backends mean three points on the ring. Uneven arcs cause uneven load. Use 100+ virtual nodes per real backend.
- Hashing on the wrong key. Hashing on the user ID is great for affinity. Hashing on a near-unique key like a request ID gives you random distribution, not consistent hashing.
- Forgetting health-aware variants. All these algorithms should skip unhealthy backends. The LB needs to combine the algorithm with active health checking.
- Picking by reputation. “Consistent hashing is the modern one.” Not for stateless HTTP. Match the algorithm to the workload.
Quick recap
- Round robin: take turns. Simple, fine for uniform pools.
- Weighted round robin: take turns with bias. Useful during transitions.
- Least connections: send to the least busy backend. Default for real HTTP.
- Consistent hashing: same key, same backend, minimal disruption when the pool changes.
- Most production HTTP LBs default to least connections or power-of-two-choices for variable workloads.
This concept sits in Stage 4 (Scaling and reliability) of the System Design Roadmap.