Kafka vs RabbitMQ vs SQS
What fits when.
The three names cover most of the real-world messaging choices. They look interchangeable from the outside (“put a message in, get a message out”) and are deeply different inside. Kafka is a log. RabbitMQ is a broker with sophisticated routing. SQS is a managed queue. Knowing which one you actually want matters more than knowing the operational details of any one of them, because changing later is expensive.
The three models, side by side
flowchart TB
subgraph KAFKA["Kafka — append-only log, partitioned by key, consumers track their own offset"]
direction LR
KP(["Producer"]):::client
KT[("Topic / partition<br/>append-only log<br/>messages keep for days")]:::store
KC1[("Consumer A<br/>offset 1483")]:::server
KC2[("Consumer B<br/>offset 1320")]:::server
KP ==> KT
KT --> KC1
KT --> KC2
end
subgraph RABBIT["RabbitMQ — broker routes via exchanges to many queues"]
direction LR
RP(["Producer"]):::client
RX[["Exchange<br/>routing rules"]]:::infra
RQ1[("Queue A")]:::queue
RQ2[("Queue B")]:::queue
RC1[("Consumer A")]:::server
RC2[("Consumer B")]:::server
RP ==> RX
RX ==>|"matches binding"| RQ1
RX ==>|"matches binding"| RQ2
RQ1 ==> RC1
RQ2 ==> RC2
end
subgraph SQS["SQS — managed queue, one queue per workload, AWS runs it"]
direction LR
SP(["Producer"]):::client
SQ[("SQS queue<br/>(managed)")]:::queue
SC1[("Consumer 1")]:::server
SCN[("Consumer N")]:::server
SP ==> SQ
SQ ==> SC1
SQ ==> SCN
end
classDef client fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
classDef queue fill:#fed7aa,stroke:#c2410c,color:#7c2d12,stroke-width:1.5px
classDef store fill:#e9d5ff,stroke:#7e22ce,color:#581c87,stroke-width:1.5px
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
Same goal, three deeply different designs. The shape of the broker shapes everything you can do with it.
Kafka: log first, queue second
Kafka stores messages in an append-only log per partition. Producers add to the end. Consumers track a position (offset) and read forward. The log itself is durable: messages stay around for days or weeks, not just until a consumer reads them.
That single change enables most of Kafka’s superpowers:
- Replay. A new consumer can start from the beginning of the log and re-process everything.
- Multiple independent consumers. Each one keeps its own offset; they do not compete for messages.
- Ordering within a partition. Messages with the same key (e.g., user_id) always go to the same partition and are delivered in order.
- Huge throughput. Sequential disk writes are the fastest thing a disk can do; Kafka shapes the workload to favour them.
The cost:
- Operationally heavier (running brokers, ZooKeeper or KRaft, tuning replication and retention).
- Not natively rich at message-level routing or per-message ACKs.
- “Just a queue” is overpaying for what Kafka offers.
Used for: event streams, change-data-capture, analytics pipelines, audit logs, telemetry.
RabbitMQ: routing first, broker second
RabbitMQ is a classic broker. Producers send to exchanges that route, based on rules (routing keys, topic patterns, headers, fanout), into queues. Consumers pull from queues with per-message acknowledgements.
Strengths:
- Routing. Topic exchanges, header exchanges, fanout exchanges, direct exchanges. Mix and match to express almost any routing intent declaratively.
- Per-message ACKs. Consumer says “I handled this”, the broker removes it. If the consumer fails, the message redelivers cleanly.
- Priority queues, dead-letter queues, TTLs, delayed delivery. Built-in patterns for things you would otherwise have to implement.
Cost:
- Messages are usually consumed once and gone (no log to replay).
- Throughput per broker is lower than Kafka.
- Mirroring and clustering are workable but more delicate than Kafka’s replication model.
Used for: task queues, complex routing topologies, work distribution with rich semantics.
SQS: managed queue, no servers
SQS is AWS’s “just give me a queue” product. You create a queue and start using it. No brokers to operate, no ZooKeeper, no clustering decisions. Two flavours:
- Standard SQS. At-least-once delivery, best-effort ordering, very high throughput.
- FIFO SQS. Exactly-once-style delivery, strict ordering within a message group, lower throughput.
Strengths:
- Zero ops. AWS handles durability, scaling, and HA. You pay per request.
- Tight AWS integration. Lambda triggers, dead-letter queues, IAM, EventBridge.
- Excellent for “boring” queues. Background jobs, notifications, retry-with-DLQ patterns.
Cost:
- AWS-only.
- No replay, no multi-consumer groups (other consumers cannot independently read the same queue without fan-out via SNS).
- Higher latency than a co-located RabbitMQ.
Used for: any AWS-native workload that needs a queue and does not need a log.
The picker
flowchart TB
Q1{"Do you need to replay messages,<br/>or keep them around for days?"}:::query
Q2{"Do you need rich routing<br/>(topic patterns, headers, fan-out trees)?"}:::query
Q3{"Are you on AWS and want zero ops?"}:::query
Q4{"High throughput per partition,<br/>per-key ordering, multiple<br/>independent consumer groups?"}:::query
K["Kafka.<br/>Event streams, audit logs,<br/>CDC, analytics pipelines."]:::strong
R["RabbitMQ.<br/>Work distribution with rich routing.<br/>Classic task queues."]:::strong
S["SQS.<br/>Background jobs, retries,<br/>AWS-native pipelines."]:::strong
Q1 -->|"yes"| K
Q1 -->|"no"| Q2
Q2 -->|"yes, complex topology"| R
Q2 -->|"no, simple work distribution"| Q3
Q3 -->|"yes"| S
Q3 -->|"no, on-prem or other cloud"| R
Q4 -->|"yes"| K
classDef query fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
A useful one-liner per system:
- Kafka: “I want a log of events that many independent things will read.”
- RabbitMQ: “I want work to flow through queues with smart routing and per-message acks.”
- SQS: “I want a queue and I never want to think about queue infrastructure again.”
Two scenarios
Scenario one: an analytics pipeline ingesting events from many services.
Hundreds of thousands of events per second. Multiple downstream teams want to consume the same events independently (the analytics warehouse, the recommendation system, the fraud team). Old events sometimes need to be re-processed when a downstream’s logic changes. This is Kafka territory. Each consumer tracks its own offset; retention is days; re-processing is just rewinding.
Scenario two: a background job system for a SaaS app.
User actions enqueue jobs (“send email”, “regenerate PDF”, “sync to CRM”). Workers pull and process. Per-job retries with DLQ. Lives in AWS. Throughput is moderate. Use SQS. You will not regret it. The day you need richer routing, switch to RabbitMQ; the day you need a log, switch to Kafka. Most jobs never need either.
What this connects to
- Why use a message queue. Why you reach for any of these in the first place. See Why use a message queue.
- Delivery semantics. Each system offers different guarantees. See At-most-once vs at-least-once vs exactly-once.
- Pub/sub vs queue. Kafka is closer to pub/sub; SQS is a queue; RabbitMQ does both. See Pub/sub vs point-to-point queue.
- Event sourcing. Kafka is the canonical event-sourcing log. See Event sourcing vs state-based persistence.
- Idempotency. Required by all three when retries happen. See Idempotency.
Common mistakes
- Kafka because Kafka. It is operationally heavy. If you are not using the log or partitioned ordering, you are paying for nothing.
- RabbitMQ for huge throughput streaming. Not its shape. Kafka.
- SQS for “I want pub/sub.” SQS is a queue; for fanout you also need SNS. Or pick something that does both natively.
- One topic / queue per microservice without thought. Topology matters. A topic per business event is usually better than a topic per service.
- Forgetting retention and disk costs. Kafka with infinite retention is expensive. Decide a real number.
- No dead-letter handling. “We will fix it later” lasts until a poison message loops forever.
Quick recap
- Kafka: a durable log, replayable, partitioned, many independent consumers.
- RabbitMQ: a broker with rich routing and per-message ACKs.
- SQS: a managed queue on AWS with zero ops.
- Pick by the shape you need (log, routing, managed), not by what you have used before.
This concept sits in Stage 3 (Caching, queues, and async work) of the System Design Roadmap.