Managed messaging: SQS/SNS vs Pub/Sub vs Service Bus
Throughput, semantics, ordering.
Every cloud has a managed messaging service. AWS splits it into SQS (queue) and SNS (topic / fan-out). GCP has Pub/Sub that does both natively. Azure has Service Bus (broker with rich routing) and Event Grid (event-driven fan-out). The choice mostly tracks your cloud, but the semantics differ in delivery guarantees, ordering, and what scales naturally.
The three at a glance
flowchart TB
subgraph AWS["AWS — split: SQS + SNS"]
direction LR
S1[("SQS: point-to-point queue,<br/>at-least-once, FIFO option")]:::server
S2[("SNS: pub/sub topic,<br/>fan-out to SQS / Lambda / HTTP")]:::server
S3[("standard pattern:<br/>SNS → many SQS queues")]:::server
end
subgraph GCP["GCP Pub/Sub"]
direction LR
G1[("one service: pub/sub + queue semantics")]:::server
G2[("global by default,<br/>strong scaling story")]:::server
G3[("at-least-once with deduplication option;<br/>ordering keys for per-key order")]:::server
end
subgraph AZ["Azure"]
direction LR
A1[("Service Bus: broker with rich routing,<br/>queues + topics + subscriptions")]:::server
A2[("Event Grid: lightweight event fan-out")]:::server
A3[("Event Hubs: high-throughput stream<br/>(Kafka-like)")]:::server
end
classDef server fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
How each handles the two shapes
The two recurring shapes (see Pub/sub vs point-to-point queue) map to the cloud offerings differently:
flowchart TB
subgraph WORK["Work distribution (queue — one consumer per message)"]
direction LR
W1["AWS: SQS"]:::strong
W2["GCP: Pub/Sub with a single subscription, shared consumers"]:::strong
W3["Azure: Service Bus queue"]:::strong
end
subgraph FANOUT["Fan-out (pub/sub — many consumers, one message each)"]
direction LR
F1["AWS: SNS to many SQS queues<br/>(the classic SNS+SQS pattern)"]:::mid
F2["GCP: Pub/Sub topic with many subscriptions"]:::strong
F3["Azure: Service Bus topic + subscriptions,<br/>or Event Grid"]:::strong
end
classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
classDef mid fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
GCP Pub/Sub does both natively in one product, which is the cleanest mental model. AWS makes you compose SNS and SQS, which works but requires understanding both. Azure has three different services depending on shape, which is more flexible but more decisions upfront.
What actually differs
flowchart TB
F1["Throughput<br/>SQS Standard: unlimited<br/>SNS: unlimited<br/>GCP Pub/Sub: very high, global<br/>Service Bus: ~2,000 messages/sec/queue (Premium higher)<br/>Event Hubs: Kafka-class"]:::infra
F2["Ordering<br/>SQS FIFO: per message-group<br/>GCP Pub/Sub: per ordering-key<br/>Service Bus: per session<br/>Standard / non-FIFO: best-effort only"]:::infra
F3["Delivery semantics<br/>All offer at-least-once<br/>SQS FIFO and Pub/Sub Exactly-Once (regional) approximate exactly-once<br/>Build idempotent consumers regardless"]:::infra
F4["Retention<br/>SQS: up to 14 days<br/>Pub/Sub: up to 31 days (acked) / 7 days (unacked)<br/>Service Bus: unlimited up to size cap"]:::infra
F5["Dead-letter queues<br/>All three support DLQs natively<br/>Configure max-receive-count before DLQ"]:::infra
classDef infra fill:#fef3c7,stroke:#a16207,color:#713f12,stroke-width:1.5px
Where Kafka fits in (a fourth option)
When the workload is a high-throughput event log with multiple independent consumers (analytics, ETL, ML feature pipelines, audit), managed Kafka (MSK, Confluent Cloud, GCP Managed Kafka, Azure Event Hubs) is often the right answer rather than any of the queue services above. See Kafka vs RabbitMQ vs SQS for the broader context.
The clouds blur this line: Event Hubs is Kafka-compatible. Pub/Sub overlaps significantly with Kafka use cases. AWS keeps Kafka (MSK) and SQS firmly separate.
When to pick which
flowchart TB
Q1{"Which cloud?"}:::query
Q2{"What is the shape?"}:::query
A1["AWS: SQS for queues, SNS+SQS for fan-out,<br/>MSK for high-throughput streams."]:::strong
A2["GCP: Pub/Sub for almost everything;<br/>Managed Kafka or Pub/Sub Lite for cost-sensitive streams."]:::strong
A3["Azure: Service Bus for transactional messaging,<br/>Event Grid for events, Event Hubs for streams."]:::strong
Q1 -->|"AWS"| A1
Q1 -->|"GCP"| A2
Q1 -->|"Azure"| A3
classDef query fill:#dbeafe,stroke:#1e40af,color:#1e3a8a,stroke-width:1.5px
classDef strong fill:#dcfce7,stroke:#15803d,color:#14532d,stroke-width:1.5px
The honest answer: the right managed messaging service is the one in the cloud you are already in. Cross-cloud messaging is rarely worth the complexity.
Common mistakes
- Choosing exactly-once based on the brand name. “Exactly-once” in any cloud is at-least-once + dedup. Always make consumers idempotent. See Idempotency.
- No DLQ. Failed messages either loop forever or vanish. Configure a DLQ on every consumer.
- One queue for everything. Mixed-purpose queues complicate observability and scaling. One queue per logical job type.
- Ordering when you do not need it. FIFO queues are slower and have lower throughput. Use them only where order genuinely matters.
- Polling SQS too aggressively. Long polling (waitTimeSeconds=20) reduces cost and latency simultaneously.
- No monitoring on queue depth. A growing queue is a leading indicator of a consumer problem. Alert on it.
- Pub/Sub without ack deadlines tuned. Default ack deadline is short; long-running consumers get redelivered duplicates.
Quick recap
- AWS splits messaging into SQS (queue) and SNS (topic); Kafka is a separate product (MSK).
- GCP Pub/Sub does both queue and pub/sub semantics in one service.
- Azure has Service Bus (rich), Event Grid (lightweight), and Event Hubs (Kafka-like).
- All offer at-least-once delivery; idempotent consumers are mandatory.
- Pick the service that matches the shape (queue vs pub/sub vs stream), in the cloud you are already on.
This concept sits in Stage 4 (Scaling and reliability) of the System Design Roadmap.