Sharding strategies
Range, hash, directory, and the pain that follows.
Sharding splits one logical database across many physical machines. Each machine (shard) holds a slice of the data. Sharding scales writes; read replicas alone do not. The choice of how to split (range, hash, or directory) shapes everything: write hot-spots, the cost of resharding, the kinds of queries you can answer cheaply.
The problem sharding solves
One database server has a ceiling. Memory, CPU, disk I/O, network. You can vertical-scale only so far. Replicas help reads but not writes. Eventually you need to send different rows to different machines and let each handle its own slice.
flowchart TB
APP(["Application"]):::client
R[["Shard router<br/>knows where each key lives"]]:::infra
S1[("Shard 1<br/>keys 1...1M")]:::store
S2[("Shard 2<br/>keys 1M...2M")]:::store
S3[("Shard 3<br/>keys 2M...3M")]:::store
APP ==> R
R ==> S1
R ==> S2
R ==> S3
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px
The router (a library, a proxy, or the database itself) takes a key, decides which shard owns it, and forwards the operation. The three big questions: how does the router decide, what happens when shards fill, and what queries become impossible.
Range sharding
Pick a partition column (often a timestamp or sequential ID). Each shard owns a contiguous range.
flowchart LR
K(["key = order_id"]):::client
R[["range router"]]:::infra
S1[("Shard 1<br/>order_id 1 ... 1,000,000")]:::store
S2[("Shard 2<br/>order_id 1,000,001 ... 2,000,000")]:::store
S3[("Shard 3<br/>order_id 2,000,001 ... 3,000,000")]:::store
K --> R
R -->|"k = 7"| S1
R -.->|"k = 1.5M"| S2
R -.->|"k = 2.5M"| S3
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px
Strength. Range scans are trivial: keys 1 to 100,000 are all on one shard.
Weakness. Hot shards. If your partition key is a timestamp and you keep getting fresh writes, the newest shard takes all the load while older shards sit idle.
Used by: HBase, some NoSQL stores, and most time-series databases.
Hash sharding
Compute a hash of the key and use it (or hash mod N) to pick a shard. Keys get spread evenly.
flowchart LR
K(["key = user_id"]):::client
H[["hash(user_id) mod 4"]]:::infra
S0[("Shard 0")]:::store
S1[("Shard 1")]:::store
S2[("Shard 2")]:::store
S3[("Shard 3")]:::store
K --> H
H -->|"hash = 0"| S0
H -->|"hash = 1"| S1
H -->|"hash = 2"| S2
H -->|"hash = 3"| S3
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px
Strength. Even distribution. No hot shards from sequential keys.
Weakness. Range scans are impossible without contacting every shard. And resharding hurts: going from 4 shards to 5 with naïve hash mod N moves roughly 80% of all data. Consistent hashing fixes most of this, but it is a thing you have to think about.
Used by: Cassandra, DynamoDB, most Redis cluster setups.
Directory sharding
Keep a lookup table that maps every key (or every range of keys) to a shard. The router asks the directory for each key.
flowchart LR
K(["key = tenant_id"]):::client
D[("Shard directory<br/>tenant_id → shard_id")]:::infra
S1[("Shard 1")]:::store
S2[("Shard 2")]:::store
S3[("Shard 3")]:::store
K --> D
D -->|"tenant 42 → shard 1"| S1
D -.->|"tenant 99 → shard 3"| S3
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px
Strength. Maximum flexibility. Move a single tenant to its own shard. Rebalance hotspots manually. Common in multi-tenant SaaS where one big customer needs isolation.
Weakness. The directory is a critical service. If it goes down, every read fails. It usually needs heavy caching and high availability of its own.
Used by: many homegrown sharding layers, especially for multi-tenant systems.
The pain that follows: resharding
Going from N to N+1 shards is the part nobody likes to talk about.
sequenceDiagram
autonumber
participant App as Application
participant Old as N shards
participant New as N+1 shards
Note over App,Old: System is running at capacity
App->>Old: writes + reads
Note over Old,New: Begin migration
Old->>New: copy data to new shard layout
Note over Old,New: This can take days for big datasets
App->>Old: still writing here
App->>Old: new write
Old-)New: replicate change
Note over App: Dual-write window<br/>(or change-data-capture stream)
Note over App,New: Cutover
App->>New: writes + reads switch
Note over Old: Old layout retired after verification
This is days or weeks of work, with backups, validation, dual-writes, and rollback plans. The senior question on the day you pick a sharding key is “what does resharding look like when this stops working?” If the answer is “we never thought about it”, that is your first technical-debt item.
Consistent hashing makes hash-shard rebalancing much less painful: only 1/N of the data moves when adding a new shard, not (N-1)/N. This is why every modern hash-sharded system uses it.
When to pick which
- Range if you need fast range scans and your write rate is not naturally concentrated at one end of the range. Time-series often picks range with a salt to avoid hot-spotting.
- Hash if writes are uniformly distributed and you do not need range scans. Most user-keyed systems.
- Directory if you have multi-tenant isolation needs, or if you want manual control over which key lives where.
Three scenarios
Scenario one: a SaaS with thousands of tenants, one of them huge.
That one tenant uses 60% of all activity. Hash or range sharding will give them a noisy-neighbour problem. Directory sharding lets you put the big customer alone on their own shards and keep everyone else on the shared pool.
Scenario two: a clickstream events table.
A trillion rows. Reads are mostly “give me this user’s events in the last 30 days.” Hash on user_id. Cassandra handles this with one configuration line, and writes spread evenly.
Scenario three: time-series sensor data.
Range partition by time (one shard per week or month), with a hash prefix to avoid the “newest shard is always hot” problem. This is what almost every time-series database does internally.
What this connects to
- Read replicas. Shards solve write scaling, replicas solve read scaling. You usually want both. See Read replicas.
- Consistent hashing. The reason hash-sharded systems are tolerable to resharding. See Load balancing algorithms.
- CAP theorem. Sharding gives you partition tolerance by construction. Consistency across shards is the hard part. See CAP theorem.
- Distributed transactions. Multi-shard updates are no longer one-database transactions. See Two-phase commit vs sagas.
Common mistakes
- Sharding by something that does not match the read pattern. If you shard by user but most queries are by product, every query has to fan out to every shard.
- Sharding too early. Sharding is operationally painful. Most teams can postpone it for years with replicas, caching, and vertical scaling. Postpone it as long as the math allows.
- Sharding without thinking about cross-shard queries. Aggregates, joins, distinct counts. Each one becomes a scatter-gather. Some of these become so slow that you need an analytics store on the side.
- Forgetting about hot shards. Even a hash shard can be hot if one key gets 90% of the traffic (a viral video, a celebrity account). Watch per-shard load, not just totals.
- Picking a sharding key that you cannot change. Once chosen, changing it is essentially “migrate to a new database.” Choose like it is forever.
Quick recap
- Sharding splits writes; replicas spread reads.
- Range: range scans easy, hot shards possible.
- Hash: even distribution, no range scans, resharding hurts (consistent hashing helps a lot).
- Directory: flexible, multi-tenant friendly, directory becomes critical infra.
- Plan the resharding story on day one, before you ship the first shard.
This concept sits in Stage 2 (Storage and data) and resurfaces in Stage 4 (Scaling and reliability) of the System Design Roadmap.