Concept
Messaging & Events

Why use a message queue

When async beats sync.

A message queue is a piece of middleware that sits between a producer and a consumer. The producer drops a message in; the queue holds it; a consumer picks it up later. That is the whole idea. The reason you reach for one is almost never “we need a queue.” It is “we need the producer and the consumer to be decoupled in time, in failure, or in load.” Once you start needing any of those three, no amount of clever HTTP retries replaces what a queue gives you for free.

The three things a queue actually buys you

flowchart TB
    subgraph T["1. Decouple in time"]
        direction LR
        P1(["Producer"]):::client --> Q1[("Queue")]:::queue
        Q1 --> C1[("Consumer")]:::server
        N1["Consumer may be down for minutes.<br/>Producer keeps shipping into the queue."]:::infra
        Q1 -.-> N1
    end

    subgraph F["2. Decouple in failure"]
        direction LR
        P2(["Producer"]):::client --> Q2[("Queue")]:::queue
        Q2 --> C2[("Consumer<br/>X failed")]:::dead
        N2["Failed consumer = redelivered job.<br/>Producer never knew anything went wrong."]:::infra
        C2 -.-> N2
    end

    subgraph L["3. Decouple in load"]
        direction LR
        P3(["Producer<br/>10,000 / sec spike"]):::client --> Q3[("Queue<br/>absorbs the burst")]:::queue
        Q3 --> C3[("Consumer<br/>steady 1,000 / sec")]:::server
        N3["Queue is a buffer.<br/>Spikes do not crash slow downstream."]:::infra
        Q3 -.-> N3
    end

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
    classDef queue fill:#fed7aa,stroke:#c2410c,color:#7c2d12,stroke-width:1.5px
    classDef dead fill:#fecaca,stroke:#b91c1c,color:#7f1d1d,stroke-width:1.5px
    classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px

Most “we should use a queue” arguments collapse to one of these three. If none of them is true for your workload, a queue is overhead.

Without a queue: tight coupling, easy to break

A synchronous call between two services looks simple. It is also fragile. When the downstream is slow, the upstream is slow. When the downstream is down, the upstream errors out to its own caller. The two services have to be up together, scaled together, and deployed without breaking each other.

sequenceDiagram
    autonumber
    participant U as User
    participant API as API
    participant DOWN as Downstream (e.g. email service)

    U->>API: POST /signup
    API->>DOWN: send welcome email  (synchronous)
    Note over DOWN: email service is having a bad day...
    DOWN-->>API: 503 Service Unavailable
    API-->>U: 500 Internal Server Error
    Note over U: signup failed.<br/>but the email was never the point.

The user’s signup just failed because of an unrelated email service. The two should never have been linked tightly enough for one to take down the other.

With a queue: decoupled, resilient

The same flow, with the email task enqueued instead of called.

sequenceDiagram
    autonumber
    participant U as User
    participant API as API
    participant Q as Queue
    participant W as Worker
    participant DOWN as Email service

    U->>API: POST /signup
    API->>Q: enqueue {send welcome to U}
    Q-->>API: ok
    API-->>U: 201 Created

    Note over Q: message holds in the queue

    W->>Q: pull next job
    Q-->>W: job: send welcome to U
    W->>DOWN: send email
    DOWN-->>W: ok
    W->>Q: ack

The user gets a fast successful response. The email service can be down for an hour; the queue holds the work; the worker retries when the service is healthy again. The signup path is no longer coupled to the email service’s uptime.

What a queue is, structurally

flowchart LR
    P1(["Producer 1"]):::client
    P2(["Producer 2"]):::client
    PN(["Producer N"]):::client

    Q[("Durable queue<br/>messages, in order")]:::queue

    C1[("Consumer 1")]:::server
    C2[("Consumer 2")]:::server
    CM[("Consumer M")]:::server

    P1 ==> Q
    P2 ==> Q
    PN ==> Q

    Q ==> C1
    Q ==> C2
    Q ==> CM

    classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
    classDef queue fill:#fed7aa,stroke:#c2410c,color:#7c2d12,stroke-width:1.5px
    classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px

Many producers, many consumers, one queue in the middle. The producers do not know the consumers exist. The consumers can be scaled independently. The queue absorbs the impedance mismatch between fast and slow sides, between bursty and steady traffic, between healthy and recovering.

When you actually need one

  • The work is slow and you cannot block the caller for it (encoding a video, generating a report).
  • The work is best-effort and the caller does not need a confirmation right now (logs, metrics, side effects).
  • The downstream is unreliable and you do not want its outages to take the upstream with it.
  • You have bursty traffic and a slower downstream that would otherwise be crushed.
  • You need fan-out: one event triggers many independent reactions.

When a queue is overhead

  • The caller needs the result to continue (“did the payment succeed?”). That is sync, no queue.
  • Latency end-to-end matters more than throughput, and a queue adds delay.
  • Volume is small enough that a thread pool inside the application is enough.
  • The “queue” you would build is just retry-with-backoff inside the caller. Often the right answer for occasional flaky downstreams.

Two scenarios

Scenario one: an order checkout that triggers many side effects.

The user buys something. You need to: charge the card, record the order, update inventory, send a confirmation email, update a search index, send a webhook to the warehouse, ship analytics, kick off a recommendation refresh. Charging the card is sync (user is waiting). Everything else goes onto a queue. The checkout completes in 200 ms instead of 4 seconds, and a flaky email service stops being a launch blocker.

Scenario two: an IoT telemetry firehose.

50,000 sensor readings per second, with the database able to write maybe 8,000 per second sustained. Without a queue, the ingest layer drops messages or stalls. With a queue (Kafka, in this kind of workload), the queue absorbs the spike, the database consumes at its own pace, and a brief database hiccup just deepens the queue rather than losing data.

What this connects to

Common mistakes

  • Adding a queue without idempotency on the consumer. Queues redeliver. Your worker will see the same job twice. If “send email” runs twice, the user gets two emails.
  • Using a queue for sync work. “User submits → enqueue → poll for the result” is just a slower sync call. If the caller needs the answer now, keep it sync.
  • Unbounded queues. Producers outrun consumers, the queue grows, eventually disks fill. Always cap the queue or apply backpressure. See Bulkheads and rate limiting.
  • No dead-letter queue (DLQ). Some messages will fail every retry. They need to land somewhere visible, not loop forever or vanish.
  • Treating the queue as a database. Queues are for transient work. If you need to “query the queue”, you wanted a database.
  • Forgetting ordering matters or does not. Some workloads (per-user activity, per-order events) need per-key ordering. Most queues offer this only with extra effort (Kafka partitions, FIFO SQS).

Quick recap

  • A queue decouples producer and consumer in time, failure, and load.
  • Use one when work is slow, best-effort, bursty, or downstream is unreliable.
  • Skip it when the caller needs the result right now.
  • Idempotency on the consumer is mandatory.
  • The right queue is not “a queue”; pick the one that fits the workload.

This concept sits in Stage 3 (Caching, queues, and async work) of the System Design Roadmap.