The phrase "message queue" obscures the fact that Kafka, RabbitMQ, and SQS solve quite different problems. Picking the wrong one can mean rebuilding your event pipeline a year in. This article is the practical comparison — what each is best at, what it is bad at, and the kind of system that fits each.
The three architectures
reads at offset 100] Topic --> Consumer2[Consumer group B
reads at offset 0
replay] end subgraph RabbitMQ_arch [RabbitMQ: broker with exchanges] Producers2[Producer] --> Exchange[Exchange
routing rules] Exchange --> Q1[Queue 1] Exchange --> Q2[Queue 2] Q1 --> W1[Worker 1] Q2 --> W2[Worker 2] end subgraph SQS_arch [SQS: managed simple queue] Producers3[Producer] --> Q[SQS queue] Q --> W3[Worker pool] end
Three different architectures. Kafka stores a log that consumers read at their own pace. RabbitMQ routes messages to specific queues. SQS provides a single managed queue with no routing.
Kafka
An ordered, partitioned, append-only log. Messages are written to a topic; topics are partitioned across brokers; consumers read at their own offset. Critically, messages are not deleted when consumed — they sit in the log for a configurable retention period (days to forever).
What Kafka does well:
- High throughput. Millions of messages per second on a small cluster. The log structure is friendly to disk I/O and zero-copy networking.
- Replay. A new consumer can read from the beginning of the log. Good for backfilling, debugging, or training a new ML model on past data.
- Multiple consumers per topic. The same events can feed an analytics pipeline, an audit log, and a real-time dashboard simultaneously, each with its own offset.
- Durability. Messages are persisted to disk and replicated across brokers. Disk failures do not lose data.
What Kafka does poorly:
- Operational complexity. Self-hosting Kafka is a real undertaking. ZooKeeper or KRaft setup, partition rebalancing, retention tuning, monitoring, schema management. A small team should not run Kafka itself; use Confluent Cloud, AWS MSK, Aiven, or similar.
- Per-message overhead. Optimised for batches. A single low-frequency message has noticeable latency.
- No selective consumption. Consumers read sequentially; you cannot pull specific messages out of order.
Pick Kafka when: you need event sourcing, replay, multiple downstream consumers per event stream, or throughput in the hundreds of thousands per second.
RabbitMQ
A traditional message broker that implements AMQP. Producers send messages to exchanges; exchanges route them to queues based on routing keys; consumers pull from queues. When a message is acknowledged, it is deleted from the queue.
What RabbitMQ does well:
- Flexible routing. Topic exchanges, fanout exchanges, header exchanges — the routing primitives are rich and well-defined.
- Per-message delivery semantics. Each message has a clear destination and lifecycle. Good fit for task queues where each message represents a unit of work.
- Lightweight. A small RabbitMQ cluster is much easier to operate than Kafka.
- Many language clients. AMQP libraries exist for every major language.
What RabbitMQ does poorly:
- Lower throughput than Kafka. Tens of thousands of messages per second on a single broker; harder to scale beyond.
- No built-in replay. Once a message is acked, it is gone. You can build a log on top, but it is not the natural pattern.
- Operational quirks. Slow consumers can balloon memory; messages get stuck; queues sometimes need manual intervention.
Pick RabbitMQ when: you have task queues, RPC patterns, or complex routing needs and the volume is moderate. RabbitMQ is excellent for the "web app dispatching background jobs" pattern.
Amazon SQS
A managed queue. Send a message; consumers poll for messages; visibility timeouts hide a message while it is being processed; if not acked within the timeout, it reappears. Two flavours: standard (at-least-once delivery, unordered, very high throughput) and FIFO (exactly-once delivery, ordered, lower throughput).
What SQS does well:
- Zero operational burden. AWS runs it. You pay per million messages; you do not manage infrastructure.
- Auto-scaling. Throughput scales without configuration on standard queues.
- Integration with the AWS ecosystem. Lambda triggers from SQS messages; SNS publishes to SQS; CloudWatch metrics included.
- Dead-letter queues built in. Failed messages route to a dedicated queue automatically.
What SQS does poorly:
- AWS-only. Vendor lock-in; cannot be deployed on-premises or to other clouds without porting to a different broker.
- Polling-based. Consumers pull, not push. Even with long-polling, latency is higher than Kafka or RabbitMQ.
- Limited routing. Topics-and-queues style fanout requires SNS in front of SQS; the architecture spreads across services.
- Limited per-message size. 256 KB max. Larger payloads go to S3 with a pointer in the message.
Pick SQS when: you are on AWS, you want minimal operational overhead, and your throughput is in the modest-to-large range. For new projects on AWS, SQS plus Lambda is often the lowest-effort path to a working pipeline.
The fourth contender: Redis Streams
Redis Streams is a Kafka-like log structure built into Redis. Smaller scale than Kafka but no separate cluster to operate. Useful for systems that already use Redis and need event log semantics without adopting Kafka. Capable up to a few hundred thousand messages per second on a beefy Redis instance.
Side-by-side
Aspect Kafka RabbitMQ SQS
---------------- ------------- -------------- --------------
Throughput millions/sec tens of K/sec high
Latency low (batch) very low moderate
Message retention configurable until ack up to 14 days
Replay yes no no
Routing by partition rich AMQP basic
Ops burden high medium zero (managed)
Persistence always optional always
Ordering per partition per queue FIFO mode only
Multi-consumer yes by group complex requires SNS
Sweet spot event streams task queues AWS workloadsHow to choose
Quick decision tree. Most teams pick wrong by reaching for Kafka before they need it; the operational cost catches up six months in.
Common mistakes
- Using Kafka for task queues. Kafka is for ordered logs; task queues need acknowledgement-and-delete semantics. Round-pegging Kafka into a task queue role works badly. Use RabbitMQ or SQS for that pattern.
- Treating SQS standard queue as ordered. Standard SQS does not preserve order. If you need order, FIFO queues (lower throughput) or Kinesis (more like Kafka).
- Idempotency forgotten. Most systems guarantee at-least-once delivery, meaning duplicates happen. Consumers must be idempotent: processing the same message twice should be safe.
- Missing dead-letter queues. Messages that fail repeatedly clog the queue. Configure DLQs (all three systems support this) so failures are visible.
- Schema evolution unmanaged. Producers update message schemas without coordinating with consumers. Use Schema Registry (Kafka), versioned message types (RabbitMQ), or strict per-version queues (SQS).
Frequently Asked Questions
Is Kafka always the best choice for events?
No. Kafka is the best choice when scale demands it, when replay matters, or when multiple consumers need the same stream. For a five-engineer team running a moderate-traffic SaaS, Kafka is operational overhead with little benefit. RabbitMQ or managed SQS is often the right answer.
What is the difference between SQS and SNS?
SQS is a queue (one consumer eventually receives each message). SNS is a topic (every subscriber receives each message). They are commonly used together: SNS topic with multiple SQS queues subscribed, giving you fan-out plus per-consumer queues.
Share your thoughts
Worked with this in production and have a story to share, or disagree with a tradeoff? Email us at support@mybytenest.com — we read everything.