Prerequisite: Consistency

A transaction is a unit of work that either completes entirely or leaves no trace. That guarantee - obvious-seeming on a single database - becomes genuinely hard when an operation spans multiple services, each with its own database. This post covers ACID semantics, what breaks in distributed systems, and the two main solutions: Two-Phase Commit and the Saga pattern.

ACID

ACID is the set of properties a database transaction must provide:

Atomicity: All operations in a transaction succeed, or none of them do. If your code credits one account and debits another, you never end up with only one of those changes applied.

Consistency: Transactions move the database from one valid state to another. Constraints, foreign keys, and check constraints are enforced. A transaction that would violate them is rolled back.

Isolation: Concurrent transactions appear to execute serially. One transaction does not see the intermediate state of another. In practice, databases offer weaker guarantees by default for performance.

Durability: Once a transaction commits, the data is permanent. Even if the server crashes immediately after the commit, the data survives.

Isolation Levels and Their Anomalies

Full isolation (serializable) is expensive. Most databases offer weaker levels that allow certain anomalies in exchange for higher throughput:

Level Dirty Read Non-Repeatable Read Phantom Read
Read Uncommitted possible possible possible
Read Committed prevented possible possible
Repeatable Read prevented prevented possible
Serializable prevented prevented prevented
  • Dirty read: Reading uncommitted data from another transaction that might be rolled back.
  • Non-repeatable read: Reading the same row twice in one transaction and getting different values because another transaction committed in between.
  • Phantom read: Running the same range query twice and getting different sets of rows because another transaction inserted or deleted rows.

PostgreSQL’s default is Read Committed. MySQL InnoDB defaults to Repeatable Read. Serializable is available in both but rarely used due to the locking overhead.

Optimistic vs Pessimistic Locking

Pessimistic locking acquires locks before reading data it intends to modify (SELECT ... FOR UPDATE). No other transaction can modify those rows until the lock is released. Safe, but reduces concurrency and risks deadlocks.

Optimistic locking skips locks and instead checks at commit time whether the data has changed since it was read - typically using a version column:

UPDATE orders SET status = 'shipped', version = version + 1
WHERE id = $1 AND version = $2;
-- If 0 rows updated, another transaction already modified the row

Optimistic locking works well when conflicts are rare. When conflicts are frequent, optimistic locking causes many retries and its throughput advantage disappears.

Distributed Transactions

A single database gives you ACID for free. A distributed operation - charge the payment service, reserve inventory in the warehouse service, create an order in the order service - has no built-in atomicity. If payment succeeds and inventory reservation fails, you have charged the customer for nothing.

The two main approaches are Two-Phase Commit (strong consistency) and Saga (eventual consistency).

Two-Phase Commit (2PC)

2PC introduces a coordinator that orchestrates participants through two phases:

Phase 1 - Prepare: The coordinator sends PREPARE to all participants. Each participant writes the transaction to a durable log and responds READY (it can commit) or ABORT (it cannot).

Phase 2 - Commit or Abort: If all participants respond READY, the coordinator writes a commit record and sends COMMIT to all participants. If any responded ABORT, the coordinator sends ABORT.

The appeal is strong: if 2PC completes, all participants either committed or all aborted. The catch is that it is a blocking protocol. If the coordinator crashes after sending PREPARE but before sending COMMIT, participants are stuck holding locks indefinitely. They cannot unilaterally decide to commit or abort because they do not know whether other participants voted READY. Recovery requires the coordinator to come back online - making the coordinator a single point of failure.

2PC is appropriate when you have tight consistency requirements and control over all participants (both databases support XA transactions, for instance). It is rarely suitable for cross-service distributed systems over HTTP because network partitions are common and the blocking failure mode is severe.

Saga Pattern

A Saga breaks a distributed transaction into a sequence of local transactions, each within a single service. If a step fails, previously completed steps are reversed using compensating transactions.

For an e-commerce order:

  1. Order Service: Create order record → success
  2. Inventory Service: Reserve items → success
  3. Payment Service: Charge card → fails (card declined)
  4. Inventory Service (compensation): Release reservation
  5. Order Service (compensation): Cancel order

Compensating transactions are not the same as rollbacks - they are new transactions that undo the visible effects of a prior step. They must be idempotent (safe to apply multiple times) because they may be retried.

Choreography vs Orchestration

Choreography: Each service listens for events and reacts. The Order Service publishes OrderCreated, the Inventory Service listens and publishes InventoryReserved, the Payment Service listens and publishes PaymentCharged or PaymentFailed. No central coordinator.

  • Pros: No single point of failure, services are loosely coupled.
  • Cons: The overall transaction flow is implicit and hard to trace; adding a new step requires modifying the event chain.

Orchestration: A dedicated Saga orchestrator (sometimes called a process manager) sends commands to each service and tracks state. If payment fails, the orchestrator explicitly sends ReleaseInventory and CancelOrder commands.

  • Pros: The flow is explicit and centralized, easy to visualize.
  • Cons: The orchestrator is a new service to build and operate; it becomes a partial coupling point.

When to Use 2PC vs Saga

Choose 2PC when:

  • All participants are databases you control, and both support XA/distributed transactions.
  • You need strong atomicity with no intermediate visible states.
  • The operation completes in milliseconds (short-lived transactions).

Choose Saga when:

  • The transaction spans independent microservices.
  • Steps involve external APIs (payment processors, email providers) that cannot participate in 2PC.
  • The business process is long-running (minutes or hours).
  • You can tolerate brief intermediate states and can write compensating logic.

Examples

E-commerce order saga (orchestration):

Orchestrator:
  → POST /inventory/reserve {order_id, items}
  ← 200 OK: reservation_id

  → POST /payment/charge {order_id, amount, card_token}
  ← 402: card_declined

  # Payment failed - compensate
  → DELETE /inventory/reserve/{reservation_id}
  ← 200 OK

  → PATCH /orders/{order_id} {status: "failed", reason: "payment_declined"}

Each service handles its local transaction atomically. The orchestrator tracks which steps succeeded and drives compensation on failure. Every compensation endpoint must be idempotent - if the orchestrator retries DELETE /inventory/reserve/{id} because it never received the acknowledgment, the second call should succeed without double-releasing.

A common implementation detail: the orchestrator persists its state (current step, completed compensations) in a database so that an orchestrator crash does not lose track of an in-flight saga. On restart it resumes from the last durable state.


Read Next: Concurrency