Scaling

Horizontal vs vertical scaling

Trade-offs and where each breaks down.

Vertical scaling makes the box bigger: more CPU, more memory, faster disk. Horizontal scaling adds more boxes. Both work. They have completely different ceilings, different costs, and different failure modes. The trick is knowing which one you are actually hitting the wall on, because the right answer is often “do both, in this order, for this reason.”

The picture

flowchart TB
    subgraph V["Vertical scaling — make one box bigger"]
        direction LR
        V1[("4 vCPU<br/>16 GB RAM")]:::server
        V2[("16 vCPU<br/>64 GB RAM")]:::server
        V3[("96 vCPU<br/>768 GB RAM")]:::server
        V1 ==> V2 ==> V3
    end

    subgraph H["Horizontal scaling — add more boxes"]
        direction LR
        HA[("1 box")]:::server
        HB[("2 boxes")]:::server
        HC[("4 boxes")]:::server
        HD[("8 boxes")]:::server
        HA ==> HB ==> HC ==> HD
    end

    classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px

Same goal (more capacity), two completely different shapes. Vertical scales one node; horizontal multiplies nodes.

Vertical scaling: easy until it isn’t

Buy a bigger machine. Move the workload to it. Done.

No code changes. The application thinks it is still on one box.
No coordination problems. The database is the database.
No new failure modes. Same single point of failure as before.

That last point is the catch. A bigger machine is still one machine. If it dies, everything is down. And there is a ceiling: the biggest cloud instance has a maximum, and beyond it you cannot pay for more, no matter the budget.

flowchart TB
    Q["Need more capacity"]:::query
    A1["Buy bigger instance"]:::strong
    A2["Hit instance-family ceiling"]:::mid
    A3["Hit price-per-marginal-unit cliff<br/>(largest instances cost much more per vCPU)"]:::weak
    A4["Hit the hard ceiling — no bigger instance exists"]:::dead

    Q --> A1 --> A2 --> A3 --> A4

    classDef query fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
    classDef mid fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef weak fill:#fed7aa,stroke:#c2410c,color:#7c2d12,stroke-width:1.5px
    classDef dead fill:#fecaca,stroke:#b91c1c,color:#7f1d1d,stroke-width:1.5px

Vertical scaling is great while it works. The marginal cost climbs sharply at the high end (the biggest instances cost 3x what their size suggests). And there is no answer to “the biggest box is not big enough.”

Horizontal scaling: cheap per box, complicated overall

Add more boxes, route traffic across them. Capacity is now roughly N times one box.

flowchart LR
    U(["Traffic"]):::client
    LB[["Load balancer"]]:::infra
    S1[("Box 1")]:::server
    S2[("Box 2")]:::server
    S3[("Box 3")]:::server
    SN[("Box N")]:::server

    U ==> LB
    LB ==> S1
    LB ==> S2
    LB ==> S3
    LB ==> SN

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px

Capacity scales near-linearly with box count (within reason).
A single box dying is a non-event; others absorb its load.
You can grow and shrink with traffic.

But there is a tax:

You need a load balancer in front. See Load balancer: why, how, when.
The service has to be stateless, or you need shared state. See Stateless vs stateful services.
Coordination problems appear (cache invalidation across nodes, distributed locks, leader election).
Observability gets harder. “Which box served this request?” becomes a real question.

Where each one falls over

flowchart TB
    subgraph VC["Vertical scaling ceiling"]
        direction LR
        VC1["Hard limit on instance size"]:::weak
        VC2["Marginal cost of vCPU spikes<br/>at the top of the family"]:::weak
        VC3["Single failure domain"]:::dead
    end

    subgraph HC["Horizontal scaling ceiling"]
        direction LR
        HC1["Coordination cost rises<br/>with node count"]:::weak
        HC2["Shared dependencies become bottlenecks<br/>(database, cache, queue)"]:::weak
        HC3["Stateful workloads resist scaling out"]:::dead
    end

    classDef weak fill:#fed7aa,stroke:#c2410c,color:#7c2d12,stroke-width:1.5px
    classDef dead fill:#fecaca,stroke:#b91c1c,color:#7f1d1d,stroke-width:1.5px

Vertical hits a hardware ceiling. Horizontal hits a coordination ceiling. Most production systems use both: vertical for the database (until you have to shard), horizontal for the stateless layers (application servers, workers).

The right order

For 95% of systems:

Build stateless services. Even at one instance, design for many. Externalise state to a database or cache. See Stateless vs stateful services.
Vertical-scale the database until it hurts. Modern Postgres on a serious machine handles workloads people would have called “web-scale” ten years ago.
Horizontal-scale the application layer as soon as traffic justifies more than one instance.
Add read replicas when reads dominate. See Read replicas.
Shard the database only when the primary truly can’t keep up. See Sharding strategies.

Skipping step 1 makes everything else harder. Jumping to step 5 too early adds operational complexity for a problem that did not need solving yet.

Two scenarios

Scenario one: a SaaS app at 100k users.

One m5.2xlarge Postgres handles everything. The application runs on three m5.large boxes behind an ALB. Traffic doubles? Add three more application boxes (horizontal). Database hot? Move to m5.4xlarge (vertical). No sharding. No microservices. The team focuses on product.

Scenario two: a real-time bidding platform at 50,000 QPS.

Vertical scaling at the application layer is a non-starter; one box cannot do that throughput. Horizontal everything: hundreds of stateless boxes behind a high-capacity LB. The database is sharded by partner ID. Caching is distributed (Redis cluster). State lives in shared stores, never on the boxes. This is what horizontal scaling unlocks; you cannot get here vertically.

What this connects to

Stateless vs stateful services. The precondition for horizontal scaling. See Stateless vs stateful services.
Load balancers. Required for horizontal anything-with-traffic. See Load balancer: why, how, when.
Read replicas. Vertical-scale’s cheap cousin: more read capacity without sharding. See Read replicas.
Sharding strategies. The final horizontal move for a database. See Sharding strategies.
CAP theorem. Horizontal scaling introduces partitions, which forces consistency trade-offs. See CAP theorem.

Common mistakes

Jumping to horizontal scaling too early. Three nodes when one would do means three times the operational surface for the same throughput.
Vertical-scaling forever without a plan. The cliff at the top of the instance family is real. Have a horizontal story before you need it.
Horizontal-scaling a stateful service. Without externalised state, more nodes means more pinned users, more sticky sessions, more failure modes. See Sticky sessions.
Forgetting the database’s shared-dependency problem. Ten application servers all talking to one Postgres just moves the bottleneck. Pool connections; read from replicas; shard when necessary.
Measuring throughput, ignoring tail latency. Adding nodes raises throughput but does not lower p99 unless the per-request slowness is also fixed.

Quick recap

Vertical: bigger box, no code change, single failure domain, hard ceiling.
Horizontal: more boxes, near-linear capacity, coordination tax, requires statelessness.
Real systems use both. Default order: stateless code → vertical DB → horizontal app → replicas → shards.
The right answer is rarely “all horizontal” or “all vertical.” It is “which bottleneck am I actually hitting right now?”

This concept sits in Stage 4 (Scaling and reliability) of the System Design Roadmap.

Last updated May 30, 2026