Concept
Load Balancing

L4 vs L7 load balancing

Network-layer vs application-layer trade-offs.

An L4 load balancer routes based on what it can see at the TCP or UDP layer: source/destination IP addresses and ports. It does not look inside the packet. An L7 load balancer terminates the TCP connection, decodes the HTTP request, and routes based on what the request is asking for: the URL, the headers, the cookies, the method. The deeper the layer, the smarter the routing, the higher the cost.

What each layer can see

flowchart TB
    subgraph PACKET["A single HTTPS request, broken down by OSI layer"]
        direction TB
        L4[("Layer 4 (TCP/UDP)<br/>src IP : src port  →  dst IP : dst port")]:::infra
        TLS[("TLS encrypted payload")]:::store
        L7[("Layer 7 (HTTP)<br/>GET /api/v2/users/42<br/>Host: api.example.com<br/>Cookie: sid=abc<br/>Authorization: Bearer ...")]:::store
        L4 --> TLS --> L7
    end

    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px

An L4 LB sees only the outer envelope: who is talking to whom on which port. An L7 LB cracks the envelope open and reads the letter. That extra reading costs CPU and adds latency, but it gives the LB the information it needs to make smart routing choices.

L4: route by connection, fast and dumb

The LB receives a TCP packet, looks at the connection’s tuple (source IP, port, destination IP, port), picks a backend (often by hashing the tuple), and forwards every packet of that connection to the same backend. The backend talks back through the LB. The LB has no idea what application is running on top of TCP; it could be HTTP, gRPC, MQTT, or your custom binary protocol.

sequenceDiagram
    autonumber
    participant C as Client
    participant L4 as L4 Load balancer
    participant B as Backend

    C->>L4: TCP SYN  (192.0.2.1:51000 → 198.51.100.5:443)
    L4->>L4: hash 4-tuple → backend B
    L4->>B: forward SYN
    B-->>L4: SYN-ACK
    L4-->>C: SYN-ACK
    Note over C,B: TCP connection established through the LB

    Note over C,B: All bytes of this connection go through the same path.<br/>The LB never decodes the payload.

Strength. Very fast. Single-digit microseconds per packet. Protocol-agnostic. Can carry any TCP or UDP traffic. Cheap to run.

Weakness. Routing decisions are per connection, not per request. Every request inside one connection (e.g., HTTP/2 streams on one TCP connection) lands on the same backend. The LB cannot route by URL, host, or any application detail.

Used by: AWS NLB, Google Network LB, HAProxy in TCP mode, MetalLB.

L7: route by request, smart and slower

The LB terminates the client’s TLS connection, decodes the HTTP request, and decides per request which backend to send it to. Different paths on the same connection can go to different backends.

sequenceDiagram
    autonumber
    participant C as Client
    participant L7 as L7 Load balancer
    participant API as API backend
    participant ST as Static-asset backend
    participant ADM as Admin backend

    C->>L7: HTTPS handshake (LB terminates TLS)
    Note over L7: HTTP/2 connection established

    C->>L7: GET /api/users/42
    L7->>L7: parse host + path<br/>"/api/*" → API pool
    L7->>API: forward
    API-->>L7: response
    L7-->>C: response

    C->>L7: GET /assets/app.js
    L7->>L7: "/assets/*" → Static pool
    L7->>ST: forward
    ST-->>L7: response
    L7-->>C: response

    C->>L7: GET /admin/...
    L7->>L7: needs valid session cookie?
    L7->>ADM: forward
    ADM-->>L7: response
    L7-->>C: response

Strength. Path-based routing, header-based routing, host-based routing (multi-tenant SaaS), per-request retries, request-level rate limiting, edge auth, content-based decisions. This is the modern web edge.

Weakness. Slower per request (microseconds to a millisecond). The LB does the TLS work. Application-aware features cost CPU. Cannot route non-HTTP traffic (a TCP-level service needs L4 or a special-mode L7).

Used by: AWS ALB, Google HTTP(S) LB, NGINX, Envoy, Traefik, Cloudflare in its standard mode.

The decision

flowchart TB
    Q1{"Is the traffic HTTP / HTTPS / gRPC?"}:::query
    Q2{"Do you need to route by URL,<br/>host, header, or cookie?"}:::query
    Q3{"Do you need per-request retries,<br/>rate limiting, or edge auth?"}:::query

    A1["L4.<br/>Use for raw TCP, UDP, MQTT,<br/>or pure throughput at lowest latency."]:::weak
    A2["L7.<br/>The default for modern web services."]:::strong
    A3["Either works.<br/>If you need application logic, L7.<br/>Otherwise L4 is cheaper."]:::mid

    Q1 -->|"no, it is raw TCP/UDP"| A1
    Q1 -->|"yes"| Q2
    Q2 -->|"yes"| A2
    Q2 -->|"no"| Q3
    Q3 -->|"yes"| A2
    Q3 -->|"no"| A3

    classDef query fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef weak fill:#fed7aa,stroke:#c2410c,color:#7c2d12,stroke-width:1.5px
    classDef mid fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px

For modern web apps, the answer is L7 almost every time. L4 is the right tool for non-HTTP services, ultra-low-latency systems, or when you explicitly do not want the LB looking at your bytes (passthrough TLS, where TLS terminates on the backend).

Two scenarios

Scenario one: a SaaS with multiple subdomains.

api.example.com goes to the API pool, admin.example.com goes to the admin pool, static.example.com goes to a static-asset backend, and the marketing site goes elsewhere. Only an L7 LB can read the Host header and route accordingly.

Scenario two: a high-throughput multiplayer game server cluster.

Custom UDP protocol, microsecond latency budget. An L7 LB cannot help (the LB has no HTTP to decode), and the extra processing would blow the latency budget. L4 with consistent hashing on source IP is the right pick.

HTTP/2 and HTTP/3 change the LB math

L7 LBs read HTTP. HTTP/2 multiplexes many requests onto one TCP connection; HTTP/3 runs on QUIC over UDP. An L4 LB sees only one connection per client and pins all multiplexed streams to one backend. That defeats the multiplexing win.

If your backends speak HTTP/2 and you want per-request load distribution, you need an L7 LB that understands HTTP/2 streams. This is why most modern reverse proxies (Envoy, NGINX, ALB) ship with HTTP/2 stream balancing.

What this connects to

Common mistakes

  • L4 in front of HTTP/2 backends. The LB pins each TCP connection to one backend. All multiplexed streams stuck there. Your “scaled out” backend is one box for that client.
  • L7 for pure throughput workloads. Putting an L7 LB in front of a gigabit-per-second video stream is overkill and burns CPU you do not need to spend.
  • Forgetting the TLS termination point. L7 terminates TLS at the LB. If your compliance posture requires TLS end-to-end to the backend, the LB must re-encrypt before forwarding.
  • Assuming L7 is always “more secure.” It does enable WAF features, edge auth, and content inspection. It also becomes a hot target with broad access to plaintext. Configure carefully.
  • Mixing L4 and L7 without a plan. Some teams chain L4 in front of L7 (NLB → ALB on AWS). It can be the right call, but every hop adds latency. Justify it.

Quick recap

  • L4: routes by TCP/UDP tuple, fast, protocol-agnostic, dumb. Pins each connection to one backend.
  • L7: terminates TLS, decodes HTTP, routes per request by URL/host/header. Slower, smarter, modern default for web.
  • HTTP/2 and HTTP/3 push you toward L7 once you adopt them.
  • Use L4 for raw protocols and lowest-latency throughput; L7 for everything HTTP-shaped.

This concept sits in Stage 4 (Scaling and reliability) of the System Design Roadmap.