Design Uber / Lyft (Ride Sharing)
What we are building
Omar opens Uber in Midtown Manhattan. He taps a destination: JFK airport. The app quotes $42 and shows a map. He taps confirm. Four seconds later, a driver named Priya gets a ping on her phone. She accepts. Omar’s map shows Priya’s car icon moving toward him. Four minutes later she arrives, he gets in, and twenty-two minutes after that she drops him at the terminal. His card is charged $44.80 (a little waiting time added). The ride is done.
That product touches four genuinely hard problems.
- Real-time location at scale. A million drivers are each pinging their location every 4 seconds. That is 250,000 writes per second, all of them overwrite-in-place, all of them feeding a live map index that matching reads from.
- Fast matching. From Omar’s tap to “Priya is on her way,” the P99 target is under 2 seconds. Miss it and riders give up.
- A state machine that cannot be wrong. The ride passes through seven states over 25 minutes, across dropped connections and retries. Every transition must be idempotent. One bad transition means a double charge or a ghost ride.
- Surge pricing and billing accuracy. Surge is computed per neighborhood, changes every 10 seconds, and must be locked in at quote time. If the connection drops before trip end, the fare still has to be correct.
We will start with the smallest version that works, then add one piece at a time as each problem appears.
The lifecycle of one ride
Before drawing any boxes, picture what a ride actually is.
stateDiagram-v2
direction LR
[*] --> requested: Omar taps Request
requested --> matched: Priya accepts
matched --> driver_arriving: en route to Omar
driver_arriving --> in_progress: Omar boards, Priya taps Start
in_progress --> completed: Priya taps End Trip
driver_arriving --> cancelled: either side cancels
in_progress --> cancelled: rare; fraud / system
requested --> cancelled: no driver found
driver_arriving --> no_show: Priya waited, Omar did not show
completed --> [*]
cancelled --> [*]
no_show --> [*]
Everything we add later, the location index, the matching algorithm, surge pricing, path history, is machinery that drives this state machine forward safely.
Take this with you. A ride-sharing system is a state machine with a fast map lookup bolted to the front. The map gets you the right driver. The state machine makes sure you bill correctly.
How big this gets
Two scales. Same product.
| Metric | Startup city (Austin) | Uber global |
|---|---|---|
| Online drivers | 500 | ~1,000,000 |
| Active riders | 2,000 | ~5,000,000 |
| Trips per day | 5,000 | ~100,000,000 |
| Location pings per second | ~125 | ~475,000 |
| Active trips at any moment | ~52 | ~1,000,000 |
Show: how the numbers come out
Trips per second. 100M / 86,400 is about 1,160 per second sustained. Friday-night peak is 5-10x that: roughly 10,000 per second at peak.
Location pings per second. Idle drivers ping every 4 seconds. On-trip drivers ping every 1 second. About 30% of online drivers are on a trip at any moment.
- Idle: 700,000 × (1/4) = 175,000/sec
- On-trip: 300,000 × (1/1) = 300,000/sec
- Total: ~475,000/sec sustained, ~1,000,000/sec at peak
Location ingest is the dominant write workload. It beats matching by a factor of 50.
Bandwidth. Each ping is ~100 bytes (driver_id, lat, lng, heading, speed, accuracy, timestamp, signature). 475,000 × 100B = 47 MB/sec. Not a bottleneck. Storage is.
Storage. We store on-trip pings only, not idle pings.
- 100M trips/day × ~225 pings/trip × 100B per ping = ~2 TB/day. Compress and age off after 90 days.
- Current locations: 1M drivers × 200B = ~200 MB total. Tiny. Overwrite in place every 4 seconds.
What the math tells you. There are two very different workloads:
- Location ingest: 500k+ writes/sec. Overwrite-in-place in a fast in-memory store.
- Matching: ~10k/sec. Cheap to compute, but every millisecond counts.
Most of the architecture exists to keep these two paths from interfering with each other.
Take this with you. Location ingest dwarfs every other workload. Keep it on its own infrastructure so it does not leak into matching latency.
The smallest version that works
We are launching in Austin with 500 drivers. Draw the minimum system first.
flowchart LR
O([Omar]):::user --> API[/"Ride Service<br/>(request ride)"/]:::app
API --> DB[("Postgres<br/>rides table")]:::db
API --> LocDB[("PostGIS<br/>driver locations")]:::db
LocDB -.nearby drivers.-> API
API --> GW["Driver Gateway<br/>(WebSocket)"]:::app
GW --> P([Priya]):::user
classDef user fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
classDef app fill:#dcfce7,stroke:#15803d,color:#14532d
classDef db fill:#fed7aa,stroke:#c2410c,color:#7c2d12
Three endpoints carry the whole product.
| Endpoint | What it does |
|---|---|
POST /rides | Rider submits pickup and dropoff, gets back a fare range and a ride ID |
POST /rides/{id}/accept | Driver accepts a dispatch offer |
GET /rides/{id} | Rider or driver polls the current ride state |
Show: the core tables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE TABLE rides (
ride_id BIGINT PRIMARY KEY,
rider_id BIGINT NOT NULL,
driver_id BIGINT, -- NULL until matched
city_id INT NOT NULL,
state SMALLINT NOT NULL, -- 1=requested ... 6=cancelled
pickup_lat DOUBLE PRECISION NOT NULL,
pickup_lng DOUBLE PRECISION NOT NULL,
dropoff_lat DOUBLE PRECISION NOT NULL,
dropoff_lng DOUBLE PRECISION NOT NULL,
vehicle_class SMALLINT NOT NULL,
requested_at TIMESTAMPTZ NOT NULL,
matched_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
cancelled_at TIMESTAMPTZ,
fare_cents INT,
surge_mult NUMERIC(3,2) NOT NULL DEFAULT 1.00,
idempotency_key UUID
);
CREATE UNIQUE INDEX idx_idempotency ON rides (rider_id, idempotency_key);
CREATE INDEX idx_rider_active ON rides (rider_id) WHERE state IN (1,2,3,4);
CREATE INDEX idx_driver_active ON rides (driver_id) WHERE state IN (3,4);
The partial index on driver_id for active states is the guard against double-assignment at the database layer.
This handles Austin at 500 drivers. PostGIS nearby queries are fast enough. The problem shows up when we expand to NYC with 25,000 drivers and 6,000 location writes per second.
Take this with you. Start from the smallest thing that works. The interesting part is what breaks next.
Decision 1: how do we index a million moving cars?
The v1 PostGIS table breaks at scale. At 25,000 drivers pinging every 4 seconds, that is 6,250 writes per second to a relational table with a spatial index. The index cannot keep up with continuous updates while simultaneously serving fast range queries.
The core need is a location store that is fast to write (overwrite-in-place, not INSERT) and fast to query (give me all drivers within 500m of a point, right now).
The solution is a geospatial cell index in Redis. The question is which cell scheme to use.
flowchart TB
subgraph A["Option A: Geohash (rectangles, base-32)"]
A1["dr5ru7 = a specific rectangle in Midtown"]
A2["Problem: two points right next to each other<br/>can have different prefixes if they sit on a cell boundary"]:::bad
end
subgraph B["Option B: Quadtree (recursive rectangles)"]
B1["A rectangle split into 4, recursively"]
B2["Problem: tree updates on every driver move are expensive<br/>Hard to shard because the tree shape changes live"]:::bad
end
subgraph C["Option C: H3 hexagons"]
C1["Every neighbor of a hexagon is equidistant<br/>kRing(cell, 1) returns all 7 cells in one ring"]
C2["Resolution 9 = ~174m across, ~10-50 drivers per cell in dense city"]:::ok
end
classDef bad fill:#fecaca,stroke:#b91c1c,color:#7f1d1d
classDef ok fill:#dcfce7,stroke:#15803d,color:#14532d
Why hexagons beat squares. With a square grid, diagonal neighbors are 1.41x farther than edge neighbors. When you search for “drivers near a pickup,” hexagons give clean symmetric results. The code does not need to compensate for diagonal vs. edge distance.
The pick: H3 resolution 9. Cells are ~174m across, ~0.1 sq km. A dense city has 10-50 available drivers per cell at peak. A kRing(cell, 1) call covers the cell plus its 6 neighbors, a radius of roughly 500m. That is the right scale for “drivers close enough to arrive in a few minutes.”
The Redis data layout per driver:
1
2
3
4
5
6
7
8
9
10
11
driver:{driver_id} HASH lat, lng, h3_r9, status, vehicle_class, last_update_ts
TTL: 30s (refreshed on every ping)
cell:{h3_r9} SET driver IDs currently in this cell
(no TTL; swept by driver key expiry)
driver_inflight:{driver_id} STRING ride_id of active dispatch claim
TTL: 30s (released on accept or timeout)
surge:{h3_r8} STRING surge multiplier, e.g. "2.1"
TTL: 30s (refreshed by Surge Service)
Shard by H3 resolution 6 (cells ~6km across). All drivers in one metro neighborhood land on the same Redis shard. When a rider is near a shard boundary, the Matching Service reads from adjacent shards in parallel.
Take this with you. Geohash is fine for v1. H3 is the right answer at scale. Quadtree on a live dataset is wrong because tree updates on every driver move are too expensive.
Decision 2: how do we match a rider to a driver in under 2 seconds?
Omar taps Request. The Matching Service has 2 seconds to claim a driver and push an offer. Here is the algorithm.
flowchart TD
Start([Omar taps Request at pickup P]) --> Cell["Compute P's H3 r9 cell"]
Cell --> Ring1["SUNION: cell + kRing 1, 7 cells, ~500m radius"]
Ring1 --> Check{Any available candidates?}
Check -- No --> Expand["Expand to kRing 2, then 3, then 5<br/>(~1km, ~1.5km, ~2.5km)"]
Expand --> Check
Check -- Yes --> Filter["Filter: AVAILABLE status, right vehicle class, not blocked"]
Filter --> Prerank["Prerank top 50 by haversine<br/>(cheap, in-process)"]
Prerank --> ETA["Batch real road-network ETA<br/>from Routing Service, top 20 candidates<br/>timeout 300ms"]
ETA --> Score["Score: ETA + rating bonus + idle bonus"]
Score --> Claim{"Claim driver via<br/>Redis SET NX EX 30"}
Claim -- Won --> Push["Push offer to driver's pod<br/>via Driver Gateway"]
Claim -- Lost --> NextBest["Try next-best candidate"]
NextBest --> Claim
Push --> Done([Done, ~200ms total])
classDef ok fill:#dcfce7,stroke:#15803d,color:#14532d
classDef bad fill:#fecaca,stroke:#b91c1c,color:#7f1d1d
The SET NX EX 30 claim is the mutual-exclusion mechanism. If two matching attempts race for Priya, one returns 1 (won) and the other returns 0 (try the next-best driver). The 30-second TTL is a safety net: if the Matching Service crashes before dispatching, the lock auto-releases.
Never use haversine as the final score. A driver 200m away on the other side of a river is much farther by road-network ETA. Haversine is acceptable for the coarse pre-filter (narrowing 200 candidates to 50), but the final score always uses road ETA.
Show: three levels of matching, simplest first
Level 1: greedy nearest. Search the 1-ring. Filter. Score top 20 by real ETA. Claim the best available. Fast (under 200ms). Works well when supply is plentiful. Ship this first.
The downside: locally optimal, globally wasteful. If two riders request at almost the same time and the same driver is best for both, the second rider gets a much worse match than they would under paired matching.
Level 2: greedy with a dispatch window. Wait ~500ms to collect other pending requests in the same H3 area. Run a small batch match (Hungarian algorithm) to minimize total ETA across all pairs. The 500ms hold is invisible to the rider (still under 2 seconds) and improves average ETA by 5-15% in dense areas.
Level 3: predictive matching. The candidate pool includes drivers about to finish a trip in the next 60 seconds. Their ETA includes remaining trip time plus drive to pickup.
Ship Level 1. Add Level 2 when load and data justify it.
Take this with you. The Redis
SET NXis the lock that prevents double-assignment. The Routing Service is the heaviest dependency on the match path. Bound its concurrency, timeout, and fallback before your design is complete.
Decision 3: how do we keep the ride state machine correct?
The state machine has seven states and ten transitions. The machine runs for 20-25 minutes across dropped connections, retries, and backgrounded apps. A wrong transition means a double charge or an unrefunded cancellation.
Four invariants that cannot be broken:
- Every transition is idempotent.
POST /acceptcalled twice leaves the state unchanged.matched_atis set only on the first transition. - No backward transitions. Once
in_progress, the only exits arecompletedorcancelled. - One driver per ride. One active ride per driver. The Redis
driver_inflightkey and the partial unique index onrides.driver_id WHERE state IN (3,4)both enforce this. Two locks, two layers. - Cancel reason is required.
rider_cancelled,driver_cancelled,system_no_drivers,no_show,fraud. This drives the billing and refund decision.
Transitions are conditional SQL updates:
1
2
3
4
UPDATE rides
SET state = 2, driver_id = $driver, matched_at = NOW()
WHERE ride_id = $ride AND state = 1
RETURNING state;
If RETURNING yields 0 rows, the transition failed: the ride already moved on. The Driver Gateway tells Priya “ride no longer available” and clears her inflight key.
The Idempotency-Key header on POST /rides handles mobile retries. Without it, a rider who loses connectivity at the wrong moment creates two rides and gets charged twice.
Take this with you. The partial unique index on
driver_idfor active states is the database-layer guard against double-assignment. The RedisSET NXis the application-layer guard. Never rely on just one.
Decision 4: how do we handle surge pricing without coupling it to matching?
Surge is not part of the matching path. It is a read-only input to the quote.
The Surge Service reads supply and demand per H3 cell every 10 seconds and writes a multiplier:
1
multiplier = clip(demand_rate / supply_count, min=1.0, max=5.0)
demand_rate is ride requests in the cell in the last 60 seconds. supply_count is available drivers in the cell. The multiplier is written to surge:{h3_r8} in Redis with a 30-second TTL.
Surge uses H3 resolution 8 (cells ~3km across), one level coarser than the resolution 9 used for matching. A 5x multiplier on one block but 1x on the next would feel arbitrary. Neighborhood-level surge feels natural.
When Omar’s request comes in, the Ride Service reads the current multiplier from Redis and locks it into rides.surge_mult. If Priya drives slowly or waits, the fare uses the locked surge, not whatever the live multiplier is when the trip ends.
If the surge:{h3_r8} key is missing or stale, the Ride Service defaults to 1.0. The Surge Service being slow or down does not break matching.
Take this with you. Surge and matching do not talk to each other. Surge is a periodic write to a Redis key. Matching reads that key when building the quote. If surge goes down, rides keep working at 1x.
The full architecture
flowchart TB
subgraph Edge["Client edge"]
O([Omar's App]):::user
P([Priya's App]):::user
GW2["API Gateway<br/>(auth · rate-limit · idempotency)"]:::edge
end
subgraph WritePath["Synchronous write path"]
RS["Ride Service<br/>(state machine, stateless pods)"]:::app
MS["Matching Service<br/>(stateless)"]:::app
DGW["Driver Gateway<br/>(stateful · sticky routing)"]:::app
Route["Routing / ETA Service<br/>(OSRM or in-house)"]:::ext
Surge["Surge Service"]:::app
end
LI[("Redis<br/>Location Index<br/>H3 r9, sharded by r6")]:::cache
DB[("Postgres<br/>rides · drivers<br/>sharded by city_id")]:::db
K{{"Kafka<br/>ride.* events<br/>trip.location.pings"}}:::queue
subgraph Consumers["Async consumers"]
Notif["Notification Service"]:::app
TH[("S3 + ClickHouse<br/>Trip Path History · 90d")]:::db
end
O --> GW2
P --> DGW
GW2 -->|create/accept| RS
RS --> DB
RS --> MS
RS -.surge quote.-> Surge
MS --> LI
MS --> Route
DGW --> LI
DGW -.on-trip pings.-> K
DB -->|CDC outbox| K
K --> Notif
K --> TH
Surge --> LI
classDef user fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
classDef edge fill:#e2e8f0,stroke:#475569,color:#1e293b
classDef app fill:#dcfce7,stroke:#15803d,color:#14532d
classDef db fill:#fed7aa,stroke:#c2410c,color:#7c2d12
classDef cache fill:#fecaca,stroke:#b91c1c,color:#7f1d1d
classDef queue fill:#ddd6fe,stroke:#6d28d9,color:#4c1d95
classDef ext fill:#e9d5ff,stroke:#7e22ce,color:#581c87
Each component, in one line:
| Component | Purpose |
|---|---|
| API Gateway | Authenticates callers, rate-limits bots, dedupes mobile retries via idempotency keys |
| Ride Service | Owns the ride state machine. Stateless pods. Calls Matching when a driver is needed. |
| Matching Service | Stateless. Reads Redis, scores candidates with real ETA, claims driver via SET NX. |
| Driver Gateway | Stateful pods. One WebSocket per online driver. Receives pings, pushes dispatch offers. |
| Routing / ETA Service | Road-network ETA from A to B. CPU-heavy. Isolated so slowdowns do not cascade to matching. |
| Surge Service | Reads supply and demand per H3 r8 cell every 10 seconds. Writes multipliers to Redis. |
| Redis Location Index | Overwrite-in-place. H3 cell sets. Sharded by r6. The hot index for matching. |
| Postgres (sharded by city) | Source of truth for ride records. One shard per city or metro. |
| Kafka + consumers | Trip path history for fraud and disputes. Notification fan-out. All off the synchronous path. |
Walk: a ride request, end to end
Omar taps Request in Midtown. Here is what happens.
sequenceDiagram
autonumber
participant Omar
participant GW2 as API Gateway
participant RS as Ride Service
participant MS as Matching Service
participant Redis as Location Index
participant Route as Routing Service
participant DGW as Driver Gateway
participant Priya
Omar->>GW2: POST /rides (pickup: Times Sq, dropoff: JFK)
GW2->>RS: forward (auth ok, idempotency checked)
RS->>Redis: active ride for Omar? (no)
rect rgb(241, 245, 249)
Note over RS,Redis: synchronous write path (~200ms total)
RS->>RS: INSERT ride, state=requested
RS->>MS: find driver near Times Sq
MS->>Redis: SUNION cell + kRing(1), 7 H3 r9 cells (~500m radius)
Redis-->>MS: ~30 candidate driver IDs
MS->>Redis: bulk lookup attrs, filter on AVAILABLE + vehicle class = ~15
MS->>Route: batch ETA for top 20 candidates (~300ms timeout)
Route-->>MS: ETAs
MS->>MS: score by ETA + rating bonus, pick Priya
MS->>Redis: SET driver_inflight:priya ride_id NX EX 30
Redis-->>MS: 1 (claim won)
end
MS->>DGW: push offer to Priya's WebSocket pod
DGW->>Priya: ride offer (pickup, fare range, 15s timer)
RS-->>GW2: 202 Accepted (ride_id, fare_min, fare_max)
GW2-->>Omar: 202 Accepted, finding your driver
Priya->>GW2: POST /rides/{id}/accept
GW2->>RS: driver accepting
RS->>RS: UPDATE ride: state 1 to 2, driver=Priya WHERE state=1
RS-->>Omar: WebSocket push: "Priya is on her way, ~4 min"
RS->>Redis: DEL driver_inflight:priya
Three things worth pointing at:
- The 202 goes back to Omar before Priya accepts. The ride is created. Matching is in progress. Omar sees a spinner. The WebSocket delivers the update when Priya taps Accept.
- The
SET NX EX 30in step 10 is the mutual-exclusion lock. If two matching instances race for Priya, one gets1and the other gets0and moves to the next candidate. - The match path (steps 3-10) targets under 200ms. Driver acceptance adds another 3-15 seconds on top. These are different SLOs.
Walk: location pings at scale
Priya is driving toward Omar. She pings every second. With 1M online drivers, the system receives 475,000+ pings per second globally. Each ping causes exactly two writes.
Write 1: overwrite the location index (every driver, always).
The Driver Gateway computes the H3 cell from (lat, lng). It updates two Redis keys:
driver:{driver_id}: overwrite with new lat, lng, h3_r9, last_update_ts. TTL refreshed to 30 seconds.- If the H3 cell changed since last ping: remove from
cell:{old_h3}set, add tocell:{new_h3}set. If same cell: no-op.
Latency: ~0.5ms per ping. Both operations are in memory.
Write 2: trip path history (on-trip drivers only).
If Priya is on a trip, the Gateway also pushes the ping to Kafka topic trip.location.pings keyed by trip_id. A consumer batches per trip and writes to S3 and ClickHouse.
Idle-online drivers are not persisted. Only on-trip pings go to durable storage. This cuts storage cost by roughly 90%.
sequenceDiagram
autonumber
participant Priya
participant DGW as Driver Gateway
participant Redis as Location Index
participant K as Kafka
participant TH as ClickHouse + S3
Priya->>DGW: WebSocket ping (lat, lng, heading, ts) ~100 bytes
DGW->>DGW: compute H3 r9 cell from lat/lng
DGW->>Redis: HSET driver:priya lat lng h3 ts, EXPIRE 30s
Note over DGW,Redis: also update cell sets if cell changed
Note over DGW: if on-trip:
DGW->>K: trip.location.pings keyed by trip_id (fire and forget)
K->>TH: batch write per trip (~5s lag)
Take this with you. Two writes per ping: one cheap overwrite to Redis (every driver), one durable append to Kafka (on-trip only). Separating these cuts storage cost by 10x and keeps the hot write path fast.
The hot cell problem
JFK at 5pm has 200 available drivers in two H3 cells. Every rider request in that area triggers a SUNION returning 200 driver IDs. Then a Routing call for ETA across all 200. The Redis shard that owns the JFK cells is at 100% CPU, and every other key on that shard is suffering.
The defenses, cheapest first:
flowchart TD
Start([Rider requests near JFK]) --> InProc{"In-process cache on<br/>Matching Service<br/>1-second TTL?"}
InProc -- Hit --> Reuse["Use cached cell membership<br/>skip Redis SUNION"]:::ok
InProc -- Miss --> SUNION["SUNION from Redis<br/>returns up to 200 driver IDs"]
SUNION --> PreFilter["Prerank top 50 by haversine<br/>(in-process, cheap)"]
PreFilter --> Replicas{"Read from which<br/>Redis replica?"}
Replicas --> RouteTop["Routing call<br/>top 20 only, 300ms cap"]:::ok
RouteTop --> Done([Score and claim]):::ok
classDef ok fill:#dcfce7,stroke:#15803d,color:#14532d
For predictable surges like a concert ending, pre-populate the cell cache at 9:55 before the 10pm rush. The storm hits a warm cache.
Take this with you. The hot cell problem is solved by an in-process cache on the Matching Service, not by scaling Redis. Cap the Routing call at 20 candidates. Most deployments need nothing more than those two changes.
Follow-up questions
Try answering each in 3 or 4 sentences before opening the solution.
Driver ignores the offer. Priya does not tap Accept within 15 seconds. What does the system do? What if she keeps ignoring requests?
Driver loses connectivity mid-trip. Priya’s phone drops off the network for 90 seconds. How does the system know the trip is still going? What does Omar see? What if she never reconnects?
Two riders, one best driver. Omar and another rider both request at the same moment and Priya is the best match for both. Walk through the race. How do you prevent Priya from being double-assigned?
Hot cell: airport at rush hour. JFK has 200 available drivers in two H3 cells. How do you bound the work on every request so matching stays under 200ms?
Hot Redis key. The
cell:{jfk_h3}set is on one Redis shard at 100% CPU. Diagnose the problem and describe the fix in order of cheapest to most expensive.Region failure.
us-east-1goes down. NYC rides live there. What happens to in-progress trips? Can riders in other cities still book?Driver heading away from pickup. Priya just dropped off a rider and is driving toward home. The matcher picks her because she is 300m from the pickup. Is this the right call? How do you handle direction in scoring?
Fraud: fake GPS. A driver submits fake coordinates to appear in a high-surge cell. How do you detect this without adding latency to the ingest path?
Routing Service is slow or down. Matching depends on it for ETAs. How do you degrade gracefully so riders can still get matched?
Bulk cancel. A major storm hits NYC. Ops wants to cancel all in-progress rides in Manhattan and refund riders. How does the backend handle this, and what can go wrong?
Related problems
- News Feed (002). The hot-cell problem at an airport is the same fan-out problem as a celebrity post in news feed. Same fixes: in-process cache, replicas, jittered TTLs.
- Chat System (003). The in-ride chat between Omar and Priya is the same WebSocket and presence problem. The Driver Gateway here is shaped exactly like the chat gateway there.
- Notification System (010). Dispatch push to drivers, “your driver is arriving” to riders, and SMS fallback when the app is backgrounded all flow through the notification service.
Try the problem on your own first. Solutions are most valuable after you've struggled with it.
Solution: Design Uber / Lyft (Ride Sharing)
What this system is
Ride sharing is three problems in one. A real-time location index that tracks ~1M moving drivers at 475,000+ pings per second. A matching pipeline that turns a rider’s tap into an assigned driver in under 2 seconds. A ride state machine that survives 20-25 minutes of dropped connections, retries, and backgrounded apps without charging anyone for a ride that never happened.
The architecture splits cleanly along those three concerns. Location pings flow through a stateful Driver Gateway into a Redis cluster keyed by H3 hexagonal cells, overwrite-in-place, never queried for history. The Matching Service is stateless: it reads from the Redis index, applies filters, scores candidates by real road-network ETA from a separate Routing Service. Ride records live in Postgres sharded by city, with on-trip path history streamed to Kafka and S3 for fraud and dispute resolution.
The interesting work is in the seams. The Driver Gateway must be stateful (it holds a WebSocket per driver), which makes deploys harder. The state machine has cancellation, no-show, and reconnect transitions that all must be idempotent because a flaky network delivers every event at least twice.
1. The two questions that matter most
What is in scope? Matching only, or also payments, surge, ETA prediction, in-ride chat, driver onboarding? A reasonable slice: rider requests, system finds a driver, both sides see live updates until pickup, mention surge briefly, skip payments. Without scoping this first, you can spend the whole interview designing a payments system.
How big is the hottest single city? Uber global has ~1M online drivers at peak, but the single-city number shapes one shard. NYC can have 50,000 online drivers on a busy Friday night. That number is the shard size target and it determines whether Postgres with PostGIS (fine for 500 drivers) or Redis with H3 cells (required for NYC at peak) is the right location store.
Everything else follows from those two answers.
2. The math
| Metric | Value |
|---|---|
| Trip requests/sec, sustained | ~1,160 |
| Trip requests/sec, peak | ~10,000 |
| Location pings/sec, sustained | ~475,000 |
| Location pings/sec, peak | ~1,000,000 |
| Active trips globally at any moment | ~1,000,000 |
| Active trips, NYC at peak | 50,000 to 100,000 |
| Durable trip-path storage per day | ~2 TB (on-trip only, compressed) |
| Current-location storage (all online drivers) | ~200 MB |
The most important observation: location ingest dwarfs every other workload by a factor of 50. Keep it on its own infrastructure so it does not bleed into matching latency or trip-state writes.
3. The API
Request a ride (rider side).
1
2
3
4
5
6
7
8
9
10
POST /api/v1/rides
Authorization: Bearer <rider_token>
Idempotency-Key: <uuid>
{
"pickup": { "lat": 40.7580, "lng": -73.9855, "address": "Times Sq" },
"dropoff": { "lat": 40.6413, "lng": -73.7781, "address": "JFK Airport" },
"vehicle_class": "uberx",
"payment_method_id": "pm_abc"
}
| Status | Meaning |
|---|---|
| 202 Accepted | Ride created in requested state. Matching in progress. |
| 200 OK | Idempotency replay. Same ride returned. |
| 400 Bad Request | Bad input or pickup outside service area. |
| 402 Payment Required | Payment method invalid or expired. |
| 409 Conflict | Rider already has an active ride. |
| 503 Service Unavailable | No drivers found after timeout. |
Small but load-bearing choices:
| Choice | Reason |
|---|---|
| 202, not 201 | The ride exists but is not yet matched. The resource is not complete. |
| Idempotency-Key required | Mobile retries on timeout. Without it, retries create duplicate rides and double charges. |
| Fare is a range | fare_min and fare_max. Final fare depends on actual trip time and any waiting time. |
Driver accepts.
1
2
3
4
POST /api/v1/rides/{ride_id}/accept
Authorization: Bearer <driver_token>
{ "driver_eta_seconds": 240 }
| Status | Meaning |
|---|---|
| 200 OK | Driver assigned. |
| 409 Conflict | Already accepted by another driver, or wrong state. |
| 410 Gone | Rider cancelled or offer timed out. |
Location pings (driver side). Not REST. Sent over the persistent WebSocket as a compact binary frame, about 46 bytes vs ~200 bytes for JSON. At 475,000 pings/sec, the bandwidth saving is material.
4. The data model
erDiagram
drivers ||--o{ rides : "assigned to"
riders ||--o{ rides : "requests"
rides ||--o{ ride_events : "produces"
drivers {
bigint driver_id
text vehicle_class
numeric rating_avg
int home_city_id
smallint status
}
riders {
bigint rider_id
text name
text default_vehicle_class
}
rides {
bigint ride_id
bigint rider_id
bigint driver_id
int city_id
smallint state
double pickup_lat
double pickup_lng
timestamptz requested_at
timestamptz matched_at
uuid idempotency_key
numeric surge_mult
}
ride_events {
bigint event_id
bigint ride_id
text event_type
jsonb payload
timestamptz occurred_at
}
Show: the full SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
CREATE TABLE rides (
ride_id BIGINT PRIMARY KEY,
rider_id BIGINT NOT NULL,
driver_id BIGINT,
city_id INT NOT NULL,
state SMALLINT NOT NULL, -- 1=requested 2=matched 3=driver_arriving
-- 4=in_progress 5=completed 6=cancelled 7=no_show
pickup_lat DOUBLE PRECISION NOT NULL,
pickup_lng DOUBLE PRECISION NOT NULL,
dropoff_lat DOUBLE PRECISION NOT NULL,
dropoff_lng DOUBLE PRECISION NOT NULL,
vehicle_class SMALLINT NOT NULL,
requested_at TIMESTAMPTZ NOT NULL,
matched_at TIMESTAMPTZ,
picked_up_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
cancelled_at TIMESTAMPTZ,
cancel_reason SMALLINT,
fare_cents INT,
surge_mult NUMERIC(3,2) NOT NULL DEFAULT 1.00,
idempotency_key UUID
);
CREATE UNIQUE INDEX idx_idempotency ON rides (rider_id, idempotency_key);
CREATE INDEX idx_rider_active ON rides (rider_id) WHERE state IN (1,2,3,4);
CREATE INDEX idx_driver_active ON rides (driver_id) WHERE state IN (3,4);
CREATE TABLE ride_events (
event_id BIGINT PRIMARY KEY,
ride_id BIGINT NOT NULL REFERENCES rides(ride_id),
event_type TEXT NOT NULL,
payload JSONB,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_events_ride ON ride_events (ride_id, occurred_at);
The partial index on driver_id for active states is the database-layer guard against double-assignment. If Matching tries to write driver_id=Priya to two different rides in state (3,4), the second write violates this index. Belt and suspenders alongside the Redis SET NX claim.
The Driver Location Index (Redis, not SQL).
1
2
3
4
5
6
7
8
9
10
11
12
driver:{driver_id} HASH lat, lng, h3_r9, h3_r6, status,
vehicle_class, last_update_ts
TTL: 30s (refreshed on every ping)
cell:{h3_r9} SET driver IDs in this cell
(no TTL; entries swept by driver key expiry)
driver_inflight:{driver_id} STRING ride_id of active dispatch claim
TTL: 30s (cleared on accept or timeout)
surge:{h3_r8} STRING multiplier, e.g. "2.1"
TTL: 30s (refreshed by Surge Service every 10s)
Sharded by h3_r6 (coarse cells, ~6km across). All drivers in one neighborhood land on the same Redis shard. ~20 shards at peak. ~5 GB per shard.
5. H3 indexing and the matching loop
Why H3 resolution 9. Cells are ~174m across, ~0.1 sq km. A dense city has 10-50 available drivers per cell at peak. A kRing(cell, 1) call covers 7 cells, a ~500m radius. Most matches happen within ring 1. If not (suburbs), expand to ring 2, 3, 5. Cap at ring 5.
Why hexagons beat squares. Every neighbor of a hexagon is equidistant. With a square grid, diagonal neighbors are 1.41x farther than edge neighbors. Searching “drivers near a pickup” with hexagons gives symmetric results and no distance-compensation code.
Show: the matching loop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def match(ride):
pickup_cell = h3.geo_to_h3(ride.pickup_lat, ride.pickup_lng, resolution=9)
for ring_size in [1, 2, 3, 5]:
cells = h3.k_ring(pickup_cell, ring_size)
candidate_ids = redis.sunion(*[f"cell:{c}" for c in cells])
candidates = location_index.bulk_lookup(candidate_ids)
candidates = [
d for d in candidates
if d.status == AVAILABLE
and d.vehicle_class >= ride.vehicle_class
and d.driver_id not in ride.rider.blocked_drivers
]
if not candidates:
continue
candidates = sorted(candidates, key=lambda d: haversine(d, ride.pickup))[:50]
etas = routing_service.batch_eta(
origins=[(c.lat, c.lng) for c in candidates[:20]],
destination=(ride.pickup_lat, ride.pickup_lng),
timeout_ms=300,
)
scored = sorted(
zip(etas, candidates[:20]),
key=lambda t: t[0] + rating_penalty(t[1].rating) + idle_bonus(t[1])
)
for eta, candidate in scored:
if try_claim(candidate.driver_id, ride.ride_id):
return candidate.driver_id
return None
try_claim is a Redis Lua script:
1
2
3
4
5
if redis.call('SET', KEYS[1], ARGV[1], 'NX', 'EX', 30) == 1 then
return 1
else
return 0
end
SET NX EX 30: set only if not exists, expire in 30 seconds. Two racing requests for the same driver: one returns 1 (won), the other returns 0 (try next). The 30-second expiry is a safety net: if Matching crashes before dispatching, the lock releases and the driver becomes claimable again.
6. The ride state machine
stateDiagram-v2
direction LR
[*] --> requested: rider taps Request
requested --> matched: driver accepts
requested --> cancelled: no driver found / rider cancels
matched --> driver_arriving: en route shown
driver_arriving --> in_progress: rider boards, driver taps Start
driver_arriving --> cancelled: either side cancels
driver_arriving --> no_show: driver waited, rider did not show
in_progress --> completed: driver taps End Trip
in_progress --> cancelled: rare; fraud / system cancel
completed --> [*]
cancelled --> [*]
no_show --> [*]
Four invariants:
- Every transition is idempotent.
POST /acceptcalled twice leaves the state unchanged. - No backward transitions. Once
in_progress, the only exits arecompletedorcancelled. - One driver per ride. One active ride per driver. Redis
driver_inflightkey plus the partial unique index onrides.driver_id. Two locks, two layers. - Cancel reason is required.
rider_cancelled,driver_cancelled,system_no_drivers,no_show,fraud. This drives the billing and refund decision.
Transitions are conditional SQL updates:
1
2
3
4
UPDATE rides
SET state = 2, driver_id = $driver, matched_at = NOW()
WHERE ride_id = $ride AND state = 1
RETURNING state;
If RETURNING yields 0 rows, the transition failed. The Driver Gateway tells Priya “ride no longer available” and clears her inflight key.
7. The architecture
flowchart TB
subgraph Edge["Client edge"]
O([Omar's App]):::user
P([Priya's App]):::user
GW2["API Gateway<br/>(auth · idempotency · rate-limit)"]:::edge
end
subgraph WritePath["Synchronous write path"]
RS["Ride Service<br/>(state machine, stateless pods)"]:::app
MS["Matching Service<br/>(stateless)"]:::app
DGW["Driver Gateway<br/>(stateful · sticky routing)"]:::app
Route["Routing / ETA<br/>(OSRM or in-house)"]:::ext
Surge["Surge Service"]:::app
end
LI[("Redis<br/>Location Index<br/>H3 r9, sharded by r6")]:::cache
DB[("Postgres<br/>rides · drivers<br/>sharded by city_id")]:::db
K{{"Kafka<br/>ride.* events<br/>trip.location.pings"}}:::queue
subgraph Consumers["Async consumers"]
Notif["Notification Service"]:::app
TH[("S3 + ClickHouse<br/>Trip Path History · 90d")]:::db
end
O --> GW2
P --> DGW
GW2 -->|create/accept| RS
RS --> DB
RS --> MS
RS -.surge quote.-> Surge
MS --> LI
MS --> Route
DGW --> LI
DGW -.on-trip pings.-> K
DB -->|CDC outbox| K
K --> Notif
K --> TH
Surge --> LI
classDef user fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
classDef edge fill:#e2e8f0,stroke:#475569,color:#1e293b
classDef app fill:#dcfce7,stroke:#15803d,color:#14532d
classDef db fill:#fed7aa,stroke:#c2410c,color:#7c2d12
classDef cache fill:#fecaca,stroke:#b91c1c,color:#7f1d1d
classDef queue fill:#ddd6fe,stroke:#6d28d9,color:#4c1d95
classDef ext fill:#e9d5ff,stroke:#7e22ce,color:#581c87
Five things to notice:
- The Driver Gateway is stateful. A WebSocket is a connection. Sticky routing by
driver_idplus a Redis session table (gateway_session:{driver_id} -> pod_id) lets Matching push dispatch offers by looking up which pod holds each driver. - The Matching Service is stateless. It reads Redis, calls Routing, claims via
SET NX, tells the gateway to push. Easy to scale horizontally. - The Routing Service is isolated. Road-network ETA is CPU-heavy. A routing slowdown does not crash matching. Matching times out at 300ms and falls back to haversine.
- The Trips DB is sharded by city. NYC going down does not affect London.
- Notifications and path history live after Kafka. If the Notification Service crashes at 3am, rides still get matched and completed. Push messages queue up and deliver when it recovers.
8. A request, traced
sequenceDiagram
autonumber
participant Omar
participant GW2 as API Gateway
participant RS as Ride Service
participant MS as Matching
participant Redis as Location Index
participant Route as Routing
participant DGW as Driver Gateway
participant Priya
Omar->>GW2: POST /rides (pickup: Times Sq)
GW2->>RS: forward (auth ok, idempotency checked)
RS->>Redis: active ride for Omar? (no)
rect rgb(241, 245, 249)
Note over RS,Redis: synchronous write path (~200ms total)
RS->>RS: INSERT ride, state=requested
RS->>MS: find driver near Times Sq
MS->>Redis: SUNION cell + kRing(1), 7 H3 r9 cells
Redis-->>MS: ~30 candidate IDs
MS->>Redis: bulk lookup attrs, filter to ~15
MS->>Route: batch ETA, top 20 (~300ms timeout)
Route-->>MS: ETAs
MS->>MS: score, pick Priya
MS->>Redis: SET driver_inflight:priya ride_id NX EX 30
Redis-->>MS: 1 (won)
end
MS->>DGW: push offer to Priya's pod
DGW->>Priya: WebSocket: ride offer (15s timer)
RS-->>GW2: 202 Accepted
GW2-->>Omar: 202 Accepted + fare range
Priya->>GW2: POST /rides/{id}/accept
GW2->>RS: driver accepting
RS->>RS: UPDATE ride: state 1 to 2, driver=Priya WHERE state=1
RS-->>Omar: WebSocket: "Priya is on her way"
RS->>Redis: DEL driver_inflight:priya
Target latencies:
| Operation | P99 target |
|---|---|
| Create ride (202 back to Omar) | ~200ms (bottleneck: SUNION + Routing call) |
| Driver accept (state transitions to matched) | ~150ms |
| Ride status read | ~50ms |
9. The scaling journey: one city to global
flowchart LR
S1["Stage 1<br/>500 drivers, 1 city<br/>Postgres + PostGIS<br/>~$300/mo"]:::s1
S2["Stage 2<br/>5,000 drivers, 3 cities<br/>Redis + H3 cells<br/>~$2k/mo"]:::s2
S3["Stage 3<br/>50,000+ drivers, global<br/>+ Kafka · ClickHouse<br/>+ city sharding<br/>~$50k/mo"]:::s3
S4["Stage 4<br/>1M+ drivers, multi-region<br/>+ edge matching<br/>+ GDPR isolation"]:::s4
S1 --> S2 --> S3 --> S4
classDef s1 fill:#e0f2fe,stroke:#0369a1,color:#0c4a6e
classDef s2 fill:#dcfce7,stroke:#15803d,color:#14532d
classDef s3 fill:#fef3c7,stroke:#a16207,color:#713f12
classDef s4 fill:#fce7f3,stroke:#be185d,color:#831843
Stage 1: 500 drivers, one city
One Postgres with PostGIS. Driver locations in a table with a GiST spatial index. One app instance. Notifications inline. About $300/month. Ships in a weekend.
This works because you see 50 ride requests per hour. PostGIS is loafing.
Stage 2: 5,000 drivers, 3 cities
PostGIS cannot keep up with ~1,250 pings/sec. Move location to Redis with H3 cell sets. Add a dedicated Driver Gateway for WebSocket connections. Add a standalone Matching Service. Notifications consume a small Kafka topic instead of inline HTTP calls. About $2k/month.
Still one Postgres for ride records. Still one Routing instance.
Stage 3: 50,000+ drivers, global
Several things break at once: airport H3 cells become Redis hot keys; the Routing Service cannot keep up at peak; one global Matching Service has cross-region latency; the single Postgres cannot handle writes from dozens of cities.
Fixes in order:
- In-process cache on Matching for cell reads (1-second TTL). Cuts Redis load on hot cells by 10-100x.
- Matching Service replicated per city-region.
- Postgres sharded by
city_id. Each region runs its own primary plus read replicas. - Routing Service scaled independently with a circuit-breaker and haversine fallback.
- Trip path history streamed to Kafka then S3 and ClickHouse.
Cost ~$50k/month.
Stage 4: 1M+ drivers, multi-region
New problems: a region failure takes down all rides in that region; EU operations require EU rider data to stay in EU; GDPR isolation becomes a hard requirement.
Each region runs its full stack: Gateway, Redis, Matching, Trips DB, Kafka. The rider’s home region decides where the ride record lives. A full region failure triggers DNS failover for new traffic and auto-cancel with refund for in-progress rides.
The architecture has not fundamentally changed since Stage 3. You added regions and GDPR isolation. The core data model is the same one written in Stage 1.
10. Four hard sub-problems
| Situation | What stresses the system | The fix |
|---|---|---|
| Airport pickup at 5pm | One H3 cell has 200 available drivers. SUNION returns 200 IDs. Routing for all 200 would be slow. | Pre-filter by haversine to top 50 before fetching attributes. Cap Routing at top 20. In-process cache for cell membership. |
| Concert lets out, 5,000 requests in 10 minutes | Demand spike in a small area. Matching races. Surge computation lags. | Batch matching with 500ms dispatch window. Surge Service updates every 10 seconds with smoothing. |
| Driver connectivity drops mid-trip | Ride is in_progress. No pings arriving. Omar sees no movement. | State machine does not care about pings. Omar’s app shows “GPS lost” after 30 seconds. Driver app queues pings locally and replays on reconnect with original timestamps. |
| Driver just dropped off a rider, heading home | Available status but moving away from request area. Haversine makes them look close. | Use Routing Service ETA (road-network, direction-aware) not haversine for final score. A driver heading away ranks lower by ETA even if close by straight line. |
11. Reliability
Driver Gateway pod crash. All drivers on that pod disconnect. Apps retry with exponential backoff plus jitter (start 1s, max 30s). Within ~10 seconds a healthy pod absorbs them. During the reconnection window, those drivers are not matchable. In-progress rides on those drivers are unaffected: ride state lives in Postgres, not the Gateway.
Driver loses connectivity mid-trip. The trip stays in_progress. No event has transitioned it. After 15 minutes with no pings during an in_progress ride, ops gets an alert. The fare is computed from the End Trip event, not from pings, so lost connectivity does not by itself invalidate the trip.
Redis cluster loss. Catastrophic for matching. AZ replication handles most failures in ~30 seconds. On full cluster loss, location pings buffer in Gateway memory for up to 60 seconds. After that, drivers appear offline and riders see “no drivers available.”
Routing Service slow or down. Match path falls back to haversine. Quality drops but the system functions. A circuit breaker trips after sustained failures and switches all matches to haversine for a 30-second cooldown before probing again.
Trips DB primary failure. Failover to replica (~30 seconds). During the window, new ride requests in that city return 503. In-progress rides cannot transition state. Other cities are fully functional.
12. Observability
| Metric | Why it matters |
|---|---|
match.latency.p99 by city | Headline SLO. >2s for 5 min is a page. |
match.success_rate by city | Percent of requests that find a driver before timeout. |
match.no_drivers_rate | >5% in a city means a supply problem. |
dispatch.accept_rate | Percent of offers accepted. <40% means drivers gaming the system or bad match quality. |
location.ingest_rate by region | Should track drivers_online × 0.475. |
location.stale_driver_count | Drivers with last update >30s. Spikes mean a Gateway problem. |
redis.hot_cell.ops for airports | Should stay below shard CPU capacity. |
routing.latency.p99 | Feeds directly into match latency. |
state_machine.illegal_transitions | Should be near zero. Any nonzero value means a client bug or replay attack. |
trip.in_progress_no_pings.count | Trips with no ping in 60s. Possible fraud or network failure. |
Page on: match P99 >2s for 5 min in any top-20 city. Match success rate <90% for 5 min. Gateway disconnect rate >5%/min. Trips DB unavailable.
Ticket on: dispatch accept rate <40%. Hot cell ops >80% of shard capacity. Illegal transitions >0.
13. Follow-up answers
1. Driver ignores the offer.
The Matching Service holds a soft claim via driver_inflight:{driver_id} with a 30-second TTL. When dispatch times out at 15 seconds:
- Gateway tells Matching “no answer.”
- Matching releases the claim and increments a per-driver
ignored_count. - Matching re-runs, excluding this driver for the next 5 minutes.
- Omar’s UI keeps showing “finding your driver.” State stays
requested. No fee.
If a driver ignores 3 offers in a row, their status flips to offline automatically. Their app gets: “You seem inactive. Are you still online?” This stops abandoned phones from holding queue slots.
2. Driver loses connectivity mid-trip.
The ride is in in_progress. The state machine does not care about the WebSocket connection, only about state-change events.
Gateway marks the driver connection_lost in the session table but does not change ride state. After 30 seconds, Omar’s app shows a “GPS lost” overlay. Priya’s app queues location samples locally (up to ~10 minutes). When she reconnects, the Gateway accepts the queued samples, timestamped and replayed in order. If she never reconnects: after 15 minutes during in_progress, ops gets an alert and manually resolves. The fare is computed from the End Trip event, not GPS pings.
3. Two riders, one best driver.
Omar and Yui both request at almost the same time. Priya is the best match for both. Both Matching Service instances attempt SET driver_inflight:priya NX EX 30. One wins at t=200ms. The other gets 0 at t=210ms and moves to the next-best candidate.
The Redis SET NX is the first layer. The partial unique index on rides.driver_id for active states is the second layer. Belt and suspenders.
4. Hot cell: airport at rush hour.
Two costs to bound: the SUNION returns up to 200 IDs, and Routing needs real ETA for each one.
Fix: pre-filter by haversine (cheap, in-process) to the 50 nearest before fetching full attributes. After filtering on status and vehicle class, pass only the top 20 to Routing. The hot cell adds a few milliseconds of in-process work, not a Routing round-trip per candidate.
Also: in-process cache on Matching with a 1-second TTL for cell membership. A hot cell serving dozens of requests per second will be served from memory for most of them.
5. Hot Redis key.
Symptoms: one Redis shard at 100% CPU, cell:{jfk_h3} reads dominate, every other key on that shard has degraded P99.
Mitigations in order:
- In-process cache on Matching (1-second TTL). Cuts hot-cell reads by 10-100x. This alone often fixes it.
- Read replicas of the hot shard. Matching round-robins reads across them.
- Sub-shard the hot cell: replace
cell:{h3}withcell:{h3}:bucket{0..15}keyed byhash(driver_id) mod 16. Reads become a 16-way SUNION (still fast). Writes distribute across 16 keys. - Pin known hot cells (airports, stadiums) to a dedicated high-resource shard at provisioning.
Most deployments need only steps 1 and 2.
6. Region failure.
In-progress trips in the failed region cannot transition state. Riders on those trips see “service interrupted” and are auto-cancelled with a full refund. Riders in unaffected cities are fully functional. City sharding gives independent blast radii by design.
For within-region reliability: the Trips DB is replicated across AZs. A single AZ failure is invisible. A full region failure triggers DNS-based failover for new traffic, plus the auto-cancel path for in-progress rides.
7. Driver heading away from pickup.
The driver has status=AVAILABLE but is driving home. Haversine says she is 300m away. Her road-network ETA is 8 minutes because she needs to make a U-turn.
The fix: always use the Routing Service for final scoring, not haversine. Road-network ETA is direction-aware and factors in the U-turn, one-way streets, and current heading. Haversine for the coarse pre-filter (narrow 200 to 50) is fine. The final score always uses road ETA.
8. Fraud: fake GPS locations.
Detection without adding latency to the ingest path:
- Plausibility check at the Gateway. If a driver moves more than 1km between two pings 4 seconds apart (that is 900 km/h), flag the ping. Sub-millisecond check. Real cars do not teleport.
- Road-path check. For on-trip drivers, the path should follow roads. A path cutting through buildings is suspicious. Run this as a nightly batch against Trip History in ClickHouse.
- Device attestation. Google Play Integrity and Apple DeviceCheck verify the location came from a real device, not a mock-location app. Bake into the WebSocket handshake. Flagged devices receive lower matching priority.
- Surge-zone teleportation. A driver appearing in a high-surge cell with no path leading there gets flagged.
None of these sit in the synchronous match path. Fraud is mostly an async detection problem.
9. Routing Service slow or down.
Matching falls back to haversine distance for scoring:
1
2
3
4
5
try:
etas = routing_service.batch_eta(origins, dest, timeout_ms=300)
except (TimeoutError, ServiceUnavailable):
etas = [haversine(o, dest) / AVG_SPEED for o in origins]
metrics.inc("matching.routing_fallback")
Quality drops because haversine ignores the road network. A driver across a river may rank too high. But the system keeps functioning and most urban grid matches stay acceptable.
A circuit breaker trips after sustained failures and switches all Matching to haversine for a 30-second cooldown. For rider-facing ETA display: fall back to “a few minutes” instead of a specific number. A wrong number is worse than a vague one.
10. Bulk cancel: storm in Manhattan.
Ops calls POST /admin/bulk_cancel with a city and bounding-box filter. The service iterates over rides in that box with state in (2, 3, 4), transitions each to cancelled with cancel_reason=system_event, and enqueues refund events to Kafka.
Risks:
- Serial iteration at 50,000 active trips is slow. Run it as a background job with a progress key in Redis. Do not block the API thread.
- Some rides will have just completed between the query and the update. The conditional
WHERE state IN (2,3,4)handles that safely. - The refund consumer must be idempotent. If the bulk cancel job crashes halfway and replays, the consumer must not double-refund.
- Notify drivers too. Drivers with active dispatch offers should receive “ride cancelled” over their WebSocket immediately, not through a Kafka event that might lag.
14. Trade-offs worth saying out loud
Greedy match vs. batch match. Greedy is simpler, faster, and locally optimal. Batch with a 500ms dispatch window gives 5-15% better global ETA in dense areas. Ship greedy first. Add batched matching as a per-city experiment with a tunable window.
Why H3 and not geohash. Hexagonal neighbor uniformity matters when most matches are within 1-2 rings. Geohash neighbors have non-uniform distance, requiring more rings to cover the same physical radius. H3 also has compact/uncompact for multi-resolution queries.
Why a stateful Driver Gateway. The alternative (long-polling) costs 3-5x more bandwidth and battery. Stateful with sticky routing and a session table is correct here. “Stateless” is right for most services, but a WebSocket is a connection.
Why not Postgres for driver locations. 475,000 overwrite-in-place operations per second on a relational table is very painful. Redis handles this at under 1ms per operation. Location is “current truth, not historical truth.” Treating it as a cache, not a record, is the right model.
Why separate the Routing Service. Road-network ETA is CPU-intensive with a different scaling profile than matching. If Routing slows down during a sporting event, it degrades in isolation, not taking matching down with it.
15. Common mistakes
“Store driver locations in Postgres with a GiST index.” Fine for v1 at 500 drivers. Falls over at 25,000+ drivers with sustained writes. The strongest answers say “v1 uses PostGIS; here is the signal that tells us to switch.”
“Use haversine for scoring.” Acceptable as a pre-filter. As the primary score, it ignores the road network and ranks drivers on the wrong side of rivers or one-way streets. The Routing Service is a separate layer and the interview is checking whether you know that.
No state machine. A design that describes matching but never names ride states is incomplete. The state machine is where most production bugs and billing errors live.
Ignoring idempotency. Network retries on POST /rides cause duplicate rides and double charges. Idempotency keys are required, not optional. This is the single most common production incident in ride-sharing systems.
No mention of the hot-cell problem. Every city has an airport. Every stadium empties after a game. If you cannot describe how to keep one Redis key alive under load, you are missing a real operational concern that Uber has published about publicly.
Treating Routing as a free function call. It is the heaviest dependency in the match path. Bounding its concurrency, timeout, and fallback is essential. Weak answers call Routing and never discuss what happens when it is slow.
Forgetting cancellation. Half the state machine is cancel paths. Happy-path only leaves the interviewer to infer the rest. Call out driver no-response, rider cancel, driver cancel, and system cancel explicitly.
Confusing match latency with acceptance latency. Matching takes under 200ms. Driver acceptance takes 3-15 seconds. The 2-second P99 SLO is for “from rider tap until the offer reaches the driver’s phone,” not “until the driver taps Accept.” Be explicit about which one you are measuring.
Over-engineering multi-region. Cities are independent. A globally distributed multi-master location index is not needed. Per-region, per-city sharding is correct. A NYC driver cannot pick up a Tokyo rider anyway.