Databases

Sharding strategies

Range, hash, directory, and the pain that follows.

Sharding splits one logical database across many physical machines. Each machine (shard) holds a slice of the data. Sharding scales writes; read replicas alone do not. The choice of how to split (range, hash, or directory) shapes everything: write hot-spots, the cost of resharding, the kinds of queries you can answer cheaply.

The problem sharding solves

One database server has a ceiling. Memory, CPU, disk I/O, network. You can vertical-scale only so far. Replicas help reads but not writes. Eventually you need to send different rows to different machines and let each handle its own slice.

flowchart TB
    APP(["Application"]):::client
    R[["Shard router<br/>knows where each key lives"]]:::infra
    S1[("Shard 1<br/>keys 1...1M")]:::store
    S2[("Shard 2<br/>keys 1M...2M")]:::store
    S3[("Shard 3<br/>keys 2M...3M")]:::store

    APP ==> R
    R ==> S1
    R ==> S2
    R ==> S3

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px

The router (a library, a proxy, or the database itself) takes a key, decides which shard owns it, and forwards the operation. The three big questions: how does the router decide, what happens when shards fill, and what queries become impossible.

Range sharding

Pick a partition column (often a timestamp or sequential ID). Each shard owns a contiguous range.

flowchart LR
    K(["key = order_id"]):::client
    R[["range router"]]:::infra
    S1[("Shard 1<br/>order_id  1 ... 1,000,000")]:::store
    S2[("Shard 2<br/>order_id  1,000,001 ... 2,000,000")]:::store
    S3[("Shard 3<br/>order_id  2,000,001 ... 3,000,000")]:::store

    K --> R
    R -->|"k = 7"| S1
    R -.->|"k = 1.5M"| S2
    R -.->|"k = 2.5M"| S3

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px

Strength. Range scans are trivial: keys 1 to 100,000 are all on one shard.

Weakness. Hot shards. If your partition key is a timestamp and you keep getting fresh writes, the newest shard takes all the load while older shards sit idle.

Used by: HBase, some NoSQL stores, and most time-series databases.

Hash sharding

Compute a hash of the key and use it (or hash mod N) to pick a shard. Keys get spread evenly.

flowchart LR
    K(["key = user_id"]):::client
    H[["hash(user_id) mod 4"]]:::infra
    S0[("Shard 0")]:::store
    S1[("Shard 1")]:::store
    S2[("Shard 2")]:::store
    S3[("Shard 3")]:::store

    K --> H
    H -->|"hash = 0"| S0
    H -->|"hash = 1"| S1
    H -->|"hash = 2"| S2
    H -->|"hash = 3"| S3

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px

Strength. Even distribution. No hot shards from sequential keys.

Weakness. Range scans are impossible without contacting every shard. And resharding hurts: going from 4 shards to 5 with naïve hash mod N moves roughly 80% of all data. Consistent hashing fixes most of this, but it is a thing you have to think about.

Used by: Cassandra, DynamoDB, most Redis cluster setups.

Directory sharding

Keep a lookup table that maps every key (or every range of keys) to a shard. The router asks the directory for each key.

flowchart LR
    K(["key = tenant_id"]):::client
    D[("Shard directory<br/>tenant_id  →  shard_id")]:::infra
    S1[("Shard 1")]:::store
    S2[("Shard 2")]:::store
    S3[("Shard 3")]:::store

    K --> D
    D -->|"tenant 42 → shard 1"| S1
    D -.->|"tenant 99 → shard 3"| S3

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
    classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px

Strength. Maximum flexibility. Move a single tenant to its own shard. Rebalance hotspots manually. Common in multi-tenant SaaS where one big customer needs isolation.

Weakness. The directory is a critical service. If it goes down, every read fails. It usually needs heavy caching and high availability of its own.

Used by: many homegrown sharding layers, especially for multi-tenant systems.

The pain that follows: resharding

Going from N to N+1 shards is the part nobody likes to talk about.

sequenceDiagram
    autonumber
    participant App as Application
    participant Old as N shards
    participant New as N+1 shards

    Note over App,Old: System is running at capacity
    App->>Old: writes + reads

    Note over Old,New: Begin migration
    Old->>New: copy data to new shard layout
    Note over Old,New: This can take days for big datasets

    App->>Old: still writing here
    App->>Old: new write
    Old-)New: replicate change
    Note over App: Dual-write window<br/>(or change-data-capture stream)

    Note over App,New: Cutover
    App->>New: writes + reads switch
    Note over Old: Old layout retired after verification

This is days or weeks of work, with backups, validation, dual-writes, and rollback plans. The senior question on the day you pick a sharding key is “what does resharding look like when this stops working?” If the answer is “we never thought about it”, that is your first technical-debt item.

Consistent hashing makes hash-shard rebalancing much less painful: only 1/N of the data moves when adding a new shard, not (N-1)/N. This is why every modern hash-sharded system uses it.

When to pick which

Range if you need fast range scans and your write rate is not naturally concentrated at one end of the range. Time-series often picks range with a salt to avoid hot-spotting.
Hash if writes are uniformly distributed and you do not need range scans. Most user-keyed systems.
Directory if you have multi-tenant isolation needs, or if you want manual control over which key lives where.

Three scenarios

Scenario one: a SaaS with thousands of tenants, one of them huge.

That one tenant uses 60% of all activity. Hash or range sharding will give them a noisy-neighbour problem. Directory sharding lets you put the big customer alone on their own shards and keep everyone else on the shared pool.

Scenario two: a clickstream events table.

A trillion rows. Reads are mostly “give me this user’s events in the last 30 days.” Hash on user_id. Cassandra handles this with one configuration line, and writes spread evenly.

Scenario three: time-series sensor data.

Range partition by time (one shard per week or month), with a hash prefix to avoid the “newest shard is always hot” problem. This is what almost every time-series database does internally.

What this connects to

Read replicas. Shards solve write scaling, replicas solve read scaling. You usually want both. See Read replicas.
Consistent hashing. The reason hash-sharded systems are tolerable to resharding. See Load balancing algorithms.
CAP theorem. Sharding gives you partition tolerance by construction. Consistency across shards is the hard part. See CAP theorem.
Distributed transactions. Multi-shard updates are no longer one-database transactions. See Two-phase commit vs sagas.

Common mistakes

Sharding by something that does not match the read pattern. If you shard by user but most queries are by product, every query has to fan out to every shard.
Sharding too early. Sharding is operationally painful. Most teams can postpone it for years with replicas, caching, and vertical scaling. Postpone it as long as the math allows.
Sharding without thinking about cross-shard queries. Aggregates, joins, distinct counts. Each one becomes a scatter-gather. Some of these become so slow that you need an analytics store on the side.
Forgetting about hot shards. Even a hash shard can be hot if one key gets 90% of the traffic (a viral video, a celebrity account). Watch per-shard load, not just totals.
Picking a sharding key that you cannot change. Once chosen, changing it is essentially “migrate to a new database.” Choose like it is forever.

Quick recap

Sharding splits writes; replicas spread reads.
Range: range scans easy, hot shards possible.
Hash: even distribution, no range scans, resharding hurts (consistent hashing helps a lot).
Directory: flexible, multi-tenant friendly, directory becomes critical infra.
Plan the resharding story on day one, before you ship the first shard.

This concept sits in Stage 2 (Storage and data) and resurfaces in Stage 4 (Scaling and reliability) of the System Design Roadmap.

Last updated May 29, 2026