Message Queues & Event-Driven Architecture
Prerequisite: Databases & Indexes
Synchronous request-response works well when the caller can afford to wait and the callee is reliably fast. At scale, neither assumption holds. A payment service that calls fraud detection, inventory, and email confirmation synchronously couples its latency to the sum of all three, and couples its availability to the product of all three. Message queues break this coupling: the caller drops a message and moves on; downstream services process at their own pace.
Point-to-Point vs Pub/Sub
Point-to-point queues have one producer and one consumer (or a competing consumer pool). Each message is delivered to exactly one consumer. Task queues fit this model: an image processing job is enqueued once and processed by whichever worker picks it up next.
Publish/subscribe decouples producers from consumers entirely. A producer publishes to a topic; multiple independent consumers each receive their own copy of every message. An order-placed event might be consumed simultaneously by the inventory service, the email service, and the analytics service - none of them aware of each other.
Apache Kafka
Kafka is a distributed log, not a traditional queue. Understanding the distinction matters for choosing when to use it.
A Kafka topic is divided into partitions - ordered, immutable sequences of records. Each record has an offset - its position in the partition log. Consumers track their own offset; the broker does not delete messages after delivery. This means messages can be replayed, multiple consumer groups can independently consume the same topic at different offsets, and historical data can be reprocessed.
Consumer groups provide parallelism with ordering guarantees: each partition is assigned to exactly one consumer within a group. Scale a consumer group by adding consumers (up to the number of partitions). Add a partition to increase parallelism beyond the current consumer count.
Kafka’s retention policy keeps messages for a configurable time (days to weeks) or until a size limit is reached. Log compaction is an alternative: Kafka keeps only the latest record for each key, discarding older versions. Useful for maintaining a compacted changelog of current state.
Kafka excels at: high-throughput event streaming, audit logs, change data capture (CDC) from databases, and pipelines where replay matters. It is a poor choice for traditional task queues where you want exactly-one processing, acknowledgement-driven deletion, and per-message retry with backoff.
RabbitMQ
RabbitMQ is a broker-based message queue with routing logic at the broker. Producers publish to an exchange; the exchange routes messages to queues based on binding rules.
Exchange types:
- Direct: Route by exact routing key match.
- Topic: Route by wildcard pattern (
order.#matchesorder.placed,order.shipped). - Fanout: Broadcast to all bound queues, ignoring routing key.
Consumers acknowledge messages explicitly. If a consumer crashes before acknowledging, RabbitMQ redelivers the message to another consumer. A dead letter exchange (DLX) receives messages that were rejected, expired, or exceeded a retry limit - allowing a separate consumer to handle poison pills without blocking the main queue.
RabbitMQ is better suited for task queues, job dispatch, and RPC patterns where per-message acknowledgement and routing flexibility matter.
AWS SQS
SQS is a managed queue with a simple API. Key behaviors:
At-least-once delivery: A message is delivered at least once, but possibly multiple times (duplicates). Consumers must be idempotent.
Visibility timeout: When a consumer receives a message, it is hidden from other consumers for a configurable window (default 30s). If the consumer processes and deletes the message before the timeout expires, it is gone. If the consumer fails or the timeout expires, the message reappears.
FIFO queues provide exactly-once processing and strict ordering, at lower throughput. Standard queues provide higher throughput with best-effort ordering and at-least-once delivery.
Delivery Guarantees
At-most-once: Deliver immediately, never retry. Fast, but messages can be lost.
At-least-once: Retry until acknowledged. Safe, but consumers may see duplicates. The consumer must be idempotent - processing the same message twice must have the same effect as processing it once. Common technique: store a processed-message ID in a database; skip processing if already seen.
Exactly-once: Deliver and process exactly once. Expensive. Kafka’s exactly-once semantics use a two-phase commit between the Kafka broker and the consumer’s output store. In practice, most systems achieve effectively-exactly-once through idempotent at-least-once delivery.
Backpressure
When producers generate messages faster than consumers can process them, the queue grows unboundedly and the system eventually falls over. Backpressure propagates the overload signal upstream, causing producers to slow down.
Kafka naturally provides backpressure via consumer lag metrics - operators can observe lag growing and scale consumers. RabbitMQ’s queue length and memory pressure can be monitored; producers can be throttled with credit-based flow control. Without backpressure, a spike in writes can cause memory exhaustion on the broker.
Event Sourcing and CQRS
Event sourcing stores the sequence of events that led to the current state, rather than the current state itself. Instead of UPDATE account SET balance = 850, you append {type: "DEBIT", amount: 150, timestamp: ...}. Current state is reconstructed by replaying events. Benefits: complete audit log, ability to replay history to a point in time, natural fit with Kafka. Costs: read path requires projection, schema evolution is harder, event store can grow large.
CQRS (Command Query Responsibility Segregation) separates the write model (commands that mutate state) from the read model (queries that return data). The write side appends events; a projection builds a read-optimized view (often in a separate database). This allows independent scaling of reads and writes and lets the read model be shaped for query patterns rather than write consistency.
Outbox Pattern
Distributed systems face a coordination problem: atomically writing to a database and publishing a message to a queue requires a distributed transaction. The outbox pattern avoids this:
- Within the same database transaction that writes business data, also write the event to an
outboxtable. - A separate process (or a CDC tool like Debezium) tails the outbox table and publishes events to Kafka.
- The outbox row is marked as published.
The outbox write and the business data write are atomic (same transaction). The outbox-to-queue relay is retryable. This provides at-least-once delivery without a distributed transaction.
Examples
Order processing pipeline with Kafka:
[Order Service] → orders topic → [Inventory Consumer]
→ [Payment Consumer]
→ [Notification Consumer]
Each consumer group reads the orders topic independently. Inventory processes synchronously before payment; notification is best-effort. Adding a new downstream service requires no changes to the order service - subscribe a new consumer group.
Retry queue with dead letter exchange in RabbitMQ:
Producer → main_exchange → work_queue
↓ (on reject or TTL)
dead_letter_exchange → retry_queue (TTL 30s)
↓ (after TTL)
main_exchange → work_queue (re-queued)
After a configurable number of re-queues, messages are routed to a failed_queue for manual inspection or alerting.
Read Next: Microservices Architecture