L4 vs L7 load balancing
Network-layer vs application-layer trade-offs.
An L4 load balancer routes based on what it can see at the TCP or UDP layer: source/destination IP addresses and ports. It does not look inside the packet. An L7 load balancer terminates the TCP connection, decodes the HTTP request, and routes based on what the request is asking for: the URL, the headers, the cookies, the method. The deeper the layer, the smarter the routing, the higher the cost.
What each layer can see
flowchart TB
subgraph PACKET["A single HTTPS request, broken down by OSI layer"]
direction TB
L4[("Layer 4 (TCP/UDP)<br/>src IP : src port → dst IP : dst port")]:::infra
TLS[("TLS encrypted payload")]:::store
L7[("Layer 7 (HTTP)<br/>GET /api/v2/users/42<br/>Host: api.example.com<br/>Cookie: sid=abc<br/>Authorization: Bearer ...")]:::store
L4 --> TLS --> L7
end
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px
An L4 LB sees only the outer envelope: who is talking to whom on which port. An L7 LB cracks the envelope open and reads the letter. That extra reading costs CPU and adds latency, but it gives the LB the information it needs to make smart routing choices.
L4: route by connection, fast and dumb
The LB receives a TCP packet, looks at the connection’s tuple (source IP, port, destination IP, port), picks a backend (often by hashing the tuple), and forwards every packet of that connection to the same backend. The backend talks back through the LB. The LB has no idea what application is running on top of TCP; it could be HTTP, gRPC, MQTT, or your custom binary protocol.
sequenceDiagram
autonumber
participant C as Client
participant L4 as L4 Load balancer
participant B as Backend
C->>L4: TCP SYN (192.0.2.1:51000 → 198.51.100.5:443)
L4->>L4: hash 4-tuple → backend B
L4->>B: forward SYN
B-->>L4: SYN-ACK
L4-->>C: SYN-ACK
Note over C,B: TCP connection established through the LB
Note over C,B: All bytes of this connection go through the same path.<br/>The LB never decodes the payload.
Strength. Very fast. Single-digit microseconds per packet. Protocol-agnostic. Can carry any TCP or UDP traffic. Cheap to run.
Weakness. Routing decisions are per connection, not per request. Every request inside one connection (e.g., HTTP/2 streams on one TCP connection) lands on the same backend. The LB cannot route by URL, host, or any application detail.
Used by: AWS NLB, Google Network LB, HAProxy in TCP mode, MetalLB.
L7: route by request, smart and slower
The LB terminates the client’s TLS connection, decodes the HTTP request, and decides per request which backend to send it to. Different paths on the same connection can go to different backends.
sequenceDiagram
autonumber
participant C as Client
participant L7 as L7 Load balancer
participant API as API backend
participant ST as Static-asset backend
participant ADM as Admin backend
C->>L7: HTTPS handshake (LB terminates TLS)
Note over L7: HTTP/2 connection established
C->>L7: GET /api/users/42
L7->>L7: parse host + path<br/>"/api/*" → API pool
L7->>API: forward
API-->>L7: response
L7-->>C: response
C->>L7: GET /assets/app.js
L7->>L7: "/assets/*" → Static pool
L7->>ST: forward
ST-->>L7: response
L7-->>C: response
C->>L7: GET /admin/...
L7->>L7: needs valid session cookie?
L7->>ADM: forward
ADM-->>L7: response
L7-->>C: response
Strength. Path-based routing, header-based routing, host-based routing (multi-tenant SaaS), per-request retries, request-level rate limiting, edge auth, content-based decisions. This is the modern web edge.
Weakness. Slower per request (microseconds to a millisecond). The LB does the TLS work. Application-aware features cost CPU. Cannot route non-HTTP traffic (a TCP-level service needs L4 or a special-mode L7).
Used by: AWS ALB, Google HTTP(S) LB, NGINX, Envoy, Traefik, Cloudflare in its standard mode.
The decision
flowchart TB
Q1{"Is the traffic HTTP / HTTPS / gRPC?"}:::query
Q2{"Do you need to route by URL,<br/>host, header, or cookie?"}:::query
Q3{"Do you need per-request retries,<br/>rate limiting, or edge auth?"}:::query
A1["L4.<br/>Use for raw TCP, UDP, MQTT,<br/>or pure throughput at lowest latency."]:::weak
A2["L7.<br/>The default for modern web services."]:::strong
A3["Either works.<br/>If you need application logic, L7.<br/>Otherwise L4 is cheaper."]:::mid
Q1 -->|"no, it is raw TCP/UDP"| A1
Q1 -->|"yes"| Q2
Q2 -->|"yes"| A2
Q2 -->|"no"| Q3
Q3 -->|"yes"| A2
Q3 -->|"no"| A3
classDef query fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef weak fill:#fed7aa,stroke:#c2410c,color:#7c2d12,stroke-width:1.5px
classDef mid fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
For modern web apps, the answer is L7 almost every time. L4 is the right tool for non-HTTP services, ultra-low-latency systems, or when you explicitly do not want the LB looking at your bytes (passthrough TLS, where TLS terminates on the backend).
Two scenarios
Scenario one: a SaaS with multiple subdomains.
api.example.com goes to the API pool, admin.example.com goes to the admin pool, static.example.com goes to a static-asset backend, and the marketing site goes elsewhere. Only an L7 LB can read the Host header and route accordingly.
Scenario two: a high-throughput multiplayer game server cluster.
Custom UDP protocol, microsecond latency budget. An L7 LB cannot help (the LB has no HTTP to decode), and the extra processing would blow the latency budget. L4 with consistent hashing on source IP is the right pick.
HTTP/2 and HTTP/3 change the LB math
L7 LBs read HTTP. HTTP/2 multiplexes many requests onto one TCP connection; HTTP/3 runs on QUIC over UDP. An L4 LB sees only one connection per client and pins all multiplexed streams to one backend. That defeats the multiplexing win.
If your backends speak HTTP/2 and you want per-request load distribution, you need an L7 LB that understands HTTP/2 streams. This is why most modern reverse proxies (Envoy, NGINX, ALB) ship with HTTP/2 stream balancing.
What this connects to
- Load balancer basics. The shape this concept fits inside. See Load balancer: why, how, when.
- Load balancing algorithms. Algorithm choice plays differently at L4 vs L7. See Load balancing algorithms.
- Sticky sessions. Easier at L7 (cookies) than at L4 (source-IP hash). See Sticky sessions.
- HTTP/2 and HTTP/3. L4 vs L7 choice changes once you adopt them. See HTTP/2 and HTTP/3.
- TCP vs UDP. L7 effectively means “load balance HTTP”; L4 covers everything else. See TCP vs UDP.
Common mistakes
- L4 in front of HTTP/2 backends. The LB pins each TCP connection to one backend. All multiplexed streams stuck there. Your “scaled out” backend is one box for that client.
- L7 for pure throughput workloads. Putting an L7 LB in front of a gigabit-per-second video stream is overkill and burns CPU you do not need to spend.
- Forgetting the TLS termination point. L7 terminates TLS at the LB. If your compliance posture requires TLS end-to-end to the backend, the LB must re-encrypt before forwarding.
- Assuming L7 is always “more secure.” It does enable WAF features, edge auth, and content inspection. It also becomes a hot target with broad access to plaintext. Configure carefully.
- Mixing L4 and L7 without a plan. Some teams chain L4 in front of L7 (NLB → ALB on AWS). It can be the right call, but every hop adds latency. Justify it.
Quick recap
- L4: routes by TCP/UDP tuple, fast, protocol-agnostic, dumb. Pins each connection to one backend.
- L7: terminates TLS, decodes HTTP, routes per request by URL/host/header. Slower, smarter, modern default for web.
- HTTP/2 and HTTP/3 push you toward L7 once you adopt them.
- Use L4 for raw protocols and lowest-latency throughput; L7 for everything HTTP-shaped.
This concept sits in Stage 4 (Scaling and reliability) of the System Design Roadmap.