Load balancer: why, how, when
What problem a load balancer really solves.
A load balancer is a single front door for your service. Requests come in to one address. The load balancer picks one of your backend servers and forwards the request to it. That is the whole idea. Everything else is a refinement of that one idea.
The problem it solves
You ship a service. One server, one IP address, one DNS record. It can handle 1,000 requests per second. Things go well. Traffic grows to 3,000 per second. Your one server cannot keep up.
The fix is obvious: more servers. Three of them, say, behind some address. But which address? You cannot put three IPs in DNS and hope the user picks the right one. You cannot make the user know about your topology. You need a thing in front of those three servers that:
- Has the one address users hit.
- Knows about all three backends.
- Picks one for each incoming request.
- Notices when a backend dies and stops sending traffic to it.
That thing is a load balancer.
The basic picture
flowchart LR
U(["Users"]):::client
LB[["Load balancer<br/>one public address"]]:::infra
S1[("Server A")]:::server
S2[("Server B")]:::server
S3[("Server C")]:::server
U ==> LB
LB ==> S1
LB ==> S2
LB ==> S3
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
The user has no idea there are three servers. They send to one address. The load balancer fans the traffic out.
What happens, request by request
- A request arrives at the load balancer.
- The load balancer looks at its pool of healthy backends.
- It picks one (using some algorithm: round robin, least connections, hashing).
- It forwards the request.
- The backend responds.
- The load balancer sends the response back to the user.
The user only sees latency. They have no idea their first request hit server A and their second hit server C.
When a server dies
This is the part most people forget. A load balancer is not just a fan-out. It is also a watchdog.
Every few seconds, it pings each backend with a health check (often something like GET /healthz). If a backend fails two or three checks in a row, the load balancer takes it out of rotation.
flowchart LR
U(["Users"]):::client
LB[["Load balancer"]]:::infra
S1[("Server A")]:::server
S2[("Server B<br/>failing checks")]:::dead
S3[("Server C")]:::server
U ==> LB
LB ==>|"healthy"| S1
LB -.->|"removed after<br/>3 failed checks"| S2
LB ==>|"healthy"| S3
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
classDef dead fill:#fecaca,stroke:#b91c1c,color:#7f1d1d,stroke-width:1.5px
The user sees no error. Traffic just routes around B. When B recovers and starts passing health checks again, the load balancer puts it back in the pool.
Rolling deploys: the same trick, on purpose
Once the LB can take a backend out and put it back in safely, you can do zero-downtime deploys. Drain one instance at a time, upgrade it, return it to the pool, repeat. Users see no interruption.
sequenceDiagram
autonumber
participant Op as Deploy script
participant LB as Load balancer
participant A as Server A (v1)
participant B as Server B (v1)
participant C as Server C (v1)
Op->>LB: drain A
LB-->>A: stop sending new requests
Op->>A: upgrade to v2, run readiness check
A->>LB: I am healthy on v2
LB-->>A: return to pool
Op->>LB: drain B
LB-->>B: stop sending new requests
Op->>B: upgrade to v2
B->>LB: I am healthy on v2
LB-->>B: return to pool
Note over Op,C: ... same for C ...<br/>At no point is the whole pool down.
TLS termination: one place, not three
A common second job for the load balancer is terminating TLS. Browsers talk HTTPS to the LB; the LB talks plain HTTP (or HTTPS, if you want defence in depth) to backends inside the trusted network. You manage one certificate, not one per backend.
flowchart LR
U(["Browser"]):::client
LB[["Load balancer<br/>TLS terminates here"]]:::infra
S1[("Server A")]:::server
S2[("Server B")]:::server
S3[("Server C")]:::server
U ==>|"HTTPS (encrypted)"| LB
LB ==>|"HTTP (private network)"| S1
LB ==>|"HTTP (private network)"| S2
LB ==>|"HTTP (private network)"| S3
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
You rotate the certificate on the LB only. Backends do not even know HTTPS is involved. This alone is often the reason small teams put a load balancer in front of a single backend.
When you actually need one
You probably need a load balancer if any of these are true:
- You have more than one backend instance running the same code.
- You want to deploy new code without dropping connections.
- One backend dying should not take down the service.
- You want to terminate TLS in one place, not on every backend.
- You want to push request buffering, timeouts, and retries out of your app code and into infrastructure.
You probably do not need one yet if you have one server and your traffic fits comfortably on it, or you are a side project with three users.
Even with one backend, a load balancer is often worth it for the operational reasons alone (TLS, deploys, sane health-based routing during recovery).
Two real scenarios
Scenario one: you grew up.
You started with one server behind a Cloudflare DNS record. Traffic grew. You added a second server. Now you put an Application Load Balancer (AWS) or Cloud Load Balancing (GCP) in front, point DNS at the LB, and add both backends to the pool.
Your users notice nothing. You can now deploy without a maintenance window. You can let a backend die without paging anyone at 3 a.m.
Scenario two: a sudden flash crowd.
A post about your product hits the top of Hacker News. You wake up to a thousand-times traffic spike. Your one backend would have crashed in the first minute.
Behind a load balancer with autoscaling attached to the same pool, your fleet grows from 3 backends to 30. The load balancer picks up each new instance the moment it passes its first health check. Latency stays flat. You go back to sleep.
What you still have to decide
A load balancer is one category with several real choices inside it. Each is its own page in this library:
- L4 vs L7: does the load balancer route by IP and port, or does it understand URLs and headers? See L4 vs L7 load balancing.
- Algorithm: round robin, least connections, consistent hashing. The one you pick has more impact than people expect. See Load balancing algorithms.
- Sticky sessions: should the same user always be routed to the same backend? Sometimes yes, often no. See Sticky sessions.
Common mistakes
- Treating the load balancer as just routing. It is also health checks, TLS termination, request timeouts, and sometimes retries. Configure those. Do not just point traffic at it and forget.
- Forgetting the LB is itself a single point of failure. Run more than one. Managed LBs from AWS or GCP are usually multi-instance already, but verify it before you assume.
- Round robin on stateful backends. If user state lives in memory on a specific backend (try not to do that), round robin will scramble it. You want sticky sessions, or to externalise state to Redis.
- No backend connection cap. One slow backend should not be allowed to monopolise the LB’s connection pool. Set per-backend caps so one bad instance does not poison the whole pool.
- A useless health check. A
/healthzthat just returns 200 from the framework tells you almost nothing. Make it actually check the database connection and any critical dependency. The whole point is to detect a broken backend before users do.
Quick recap
- A load balancer is a single front door in front of many backends.
- It picks a backend per request, watches their health, and removes dead ones.
- You need one once you have more than one backend, want zero-downtime deploys, or want TLS in one place.
- The interesting decisions (L4 vs L7, algorithm, stickiness) are each their own topic.
This concept shows up in Stage 4 (Scaling and reliability) of the System Design Roadmap.