Messaging

Message Queues Compared: Kafka, RabbitMQ, SQS

The phrase "message queue" obscures the fact that Kafka, RabbitMQ, and SQS solve quite different problems. Picking the wrong one can mean rebuilding your event pipeline a year in. This article is the practical comparison — what each is best at, what it is bad at, and the kind of system that fits each.

The three architectures

flowchart TB subgraph Kafka_arch [Kafka: distributed log] Producers1[Producers] --> Topic[Topic - partitioned log] Topic --> Consumer1[Consumer group A
reads at offset 100] Topic --> Consumer2[Consumer group B
reads at offset 0
replay] end subgraph RabbitMQ_arch [RabbitMQ: broker with exchanges] Producers2[Producer] --> Exchange[Exchange
routing rules] Exchange --> Q1[Queue 1] Exchange --> Q2[Queue 2] Q1 --> W1[Worker 1] Q2 --> W2[Worker 2] end subgraph SQS_arch [SQS: managed simple queue] Producers3[Producer] --> Q[SQS queue] Q --> W3[Worker pool] end

Three different architectures. Kafka stores a log that consumers read at their own pace. RabbitMQ routes messages to specific queues. SQS provides a single managed queue with no routing.

Kafka

An ordered, partitioned, append-only log. Messages are written to a topic; topics are partitioned across brokers; consumers read at their own offset. Critically, messages are not deleted when consumed — they sit in the log for a configurable retention period (days to forever).

What Kafka does well:

  • High throughput. Millions of messages per second on a small cluster. The log structure is friendly to disk I/O and zero-copy networking.
  • Replay. A new consumer can read from the beginning of the log. Good for backfilling, debugging, or training a new ML model on past data.
  • Multiple consumers per topic. The same events can feed an analytics pipeline, an audit log, and a real-time dashboard simultaneously, each with its own offset.
  • Durability. Messages are persisted to disk and replicated across brokers. Disk failures do not lose data.

What Kafka does poorly:

  • Operational complexity. Self-hosting Kafka is a real undertaking. ZooKeeper or KRaft setup, partition rebalancing, retention tuning, monitoring, schema management. A small team should not run Kafka itself; use Confluent Cloud, AWS MSK, Aiven, or similar.
  • Per-message overhead. Optimised for batches. A single low-frequency message has noticeable latency.
  • No selective consumption. Consumers read sequentially; you cannot pull specific messages out of order.

Pick Kafka when: you need event sourcing, replay, multiple downstream consumers per event stream, or throughput in the hundreds of thousands per second.

RabbitMQ

A traditional message broker that implements AMQP. Producers send messages to exchanges; exchanges route them to queues based on routing keys; consumers pull from queues. When a message is acknowledged, it is deleted from the queue.

What RabbitMQ does well:

  • Flexible routing. Topic exchanges, fanout exchanges, header exchanges — the routing primitives are rich and well-defined.
  • Per-message delivery semantics. Each message has a clear destination and lifecycle. Good fit for task queues where each message represents a unit of work.
  • Lightweight. A small RabbitMQ cluster is much easier to operate than Kafka.
  • Many language clients. AMQP libraries exist for every major language.

What RabbitMQ does poorly:

  • Lower throughput than Kafka. Tens of thousands of messages per second on a single broker; harder to scale beyond.
  • No built-in replay. Once a message is acked, it is gone. You can build a log on top, but it is not the natural pattern.
  • Operational quirks. Slow consumers can balloon memory; messages get stuck; queues sometimes need manual intervention.

Pick RabbitMQ when: you have task queues, RPC patterns, or complex routing needs and the volume is moderate. RabbitMQ is excellent for the "web app dispatching background jobs" pattern.

Amazon SQS

A managed queue. Send a message; consumers poll for messages; visibility timeouts hide a message while it is being processed; if not acked within the timeout, it reappears. Two flavours: standard (at-least-once delivery, unordered, very high throughput) and FIFO (exactly-once delivery, ordered, lower throughput).

What SQS does well:

  • Zero operational burden. AWS runs it. You pay per million messages; you do not manage infrastructure.
  • Auto-scaling. Throughput scales without configuration on standard queues.
  • Integration with the AWS ecosystem. Lambda triggers from SQS messages; SNS publishes to SQS; CloudWatch metrics included.
  • Dead-letter queues built in. Failed messages route to a dedicated queue automatically.

What SQS does poorly:

  • AWS-only. Vendor lock-in; cannot be deployed on-premises or to other clouds without porting to a different broker.
  • Polling-based. Consumers pull, not push. Even with long-polling, latency is higher than Kafka or RabbitMQ.
  • Limited routing. Topics-and-queues style fanout requires SNS in front of SQS; the architecture spreads across services.
  • Limited per-message size. 256 KB max. Larger payloads go to S3 with a pointer in the message.

Pick SQS when: you are on AWS, you want minimal operational overhead, and your throughput is in the modest-to-large range. For new projects on AWS, SQS plus Lambda is often the lowest-effort path to a working pipeline.

The fourth contender: Redis Streams

Redis Streams is a Kafka-like log structure built into Redis. Smaller scale than Kafka but no separate cluster to operate. Useful for systems that already use Redis and need event log semantics without adopting Kafka. Capable up to a few hundred thousand messages per second on a beefy Redis instance.

Side-by-side

Aspect            Kafka          RabbitMQ        SQS
----------------  -------------  --------------  --------------
Throughput        millions/sec   tens of K/sec   high
Latency           low (batch)    very low        moderate
Message retention configurable   until ack       up to 14 days
Replay            yes            no              no
Routing           by partition   rich AMQP       basic
Ops burden        high           medium          zero (managed)
Persistence       always         optional        always
Ordering          per partition  per queue       FIFO mode only
Multi-consumer    yes by group   complex         requires SNS
Sweet spot        event streams  task queues     AWS workloads

How to choose

flowchart TD Start([Picking a message system]) --> Q1{What scale?} Q1 -->|Hundreds per second| RMQ[RabbitMQ or SQS] Q1 -->|Thousands per second| Q2{On AWS?} Q1 -->|Hundreds of thousands per second| Kafka[Kafka] Q2 -->|Yes| SQS[SQS] Q2 -->|No| Q3{Need replay?} Q3 -->|Yes| Kafka Q3 -->|No| RMQ style Kafka fill:#dbeafe,stroke:#1e40af,color:#0c1e3b style RMQ fill:#fef3c7,stroke:#92400e,color:#451a03 style SQS fill:#fde68a,stroke:#b45309,color:#451a03

Quick decision tree. Most teams pick wrong by reaching for Kafka before they need it; the operational cost catches up six months in.

Common mistakes

  • Using Kafka for task queues. Kafka is for ordered logs; task queues need acknowledgement-and-delete semantics. Round-pegging Kafka into a task queue role works badly. Use RabbitMQ or SQS for that pattern.
  • Treating SQS standard queue as ordered. Standard SQS does not preserve order. If you need order, FIFO queues (lower throughput) or Kinesis (more like Kafka).
  • Idempotency forgotten. Most systems guarantee at-least-once delivery, meaning duplicates happen. Consumers must be idempotent: processing the same message twice should be safe.
  • Missing dead-letter queues. Messages that fail repeatedly clog the queue. Configure DLQs (all three systems support this) so failures are visible.
  • Schema evolution unmanaged. Producers update message schemas without coordinating with consumers. Use Schema Registry (Kafka), versioned message types (RabbitMQ), or strict per-version queues (SQS).

Frequently Asked Questions

Is Kafka always the best choice for events?

No. Kafka is the best choice when scale demands it, when replay matters, or when multiple consumers need the same stream. For a five-engineer team running a moderate-traffic SaaS, Kafka is operational overhead with little benefit. RabbitMQ or managed SQS is often the right answer.

What is the difference between SQS and SNS?

SQS is a queue (one consumer eventually receives each message). SNS is a topic (every subscriber receives each message). They are commonly used together: SNS topic with multiple SQS queues subscribed, giving you fan-out plus per-consumer queues.

What about NATS or Redpanda?NATS is a lightweight messaging system, faster than RabbitMQ at small message sizes, with growing adoption. Redpanda is a Kafka-compatible alternative with simpler operations. Both are credible alternatives if you have specific reasons to avoid the dominant choices.

How do I handle ordering in Kafka?Messages within a single partition are ordered. Across partitions, no global ordering. Pick a partition key that groups related messages (user ID, account ID) into the same partition. All operations on that key are ordered relative to each other; different keys can be processed in parallel.

Can I migrate from RabbitMQ to Kafka later?Yes, but it is non-trivial. Producers are rewritten; consumers handle ordering and offsets differently; operational practices are different. Plan a phased migration: dual-publish during transition, migrate consumers one at a time, retire RabbitMQ when no consumers remain.

Share your thoughts

Worked with this in production and have a story to share, or disagree with a tradeoff? Email us at support@mybytenest.com — we read everything.