Skip to main content

Saga Pattern — Managing Distributed Transactions in Microservices

· 8 min read
Hieu Nguyen
Senior Software Engineer at OCB

A practical guide to the Saga pattern for managing distributed transactions across microservices — with Java examples, choreography vs orchestration comparison, and real-world use cases from fintech.

In a monolithic application, maintaining data consistency is straightforward — you wrap everything in a single database transaction. But in a microservices architecture, each service owns its own database. A single business operation (like placing an order) can span multiple services, and there's no single @Transactional annotation that can save you.

This is exactly the problem I've dealt with at scale in banking and e-wallet systems, where a failed payment must trigger reversals across multiple services — and where partial failures can mean real money is lost.

The Saga pattern solves this by breaking a distributed transaction into a sequence of local transactions, each with a compensating action for rollback.

📖 Original Saga paper by Hector Garcia-Molina (1987)Microservices.io — Saga pattern

The Problem: Why Distributed Transactions Are Hard

In a monolith, this is easy:

@Transactional
public void placeOrder(OrderRequest request) {
orderRepository.save(order);
paymentService.charge(order);
inventoryService.reserve(order);
notificationService.notify(order);
// All succeed or all rollback — simple ✅
}

In microservices, each of these is a separate service with its own database:

┌─────────────-─┐    ┌──────────-────┐    ┌──────────-────┐    ┌───────────-───┐
│ Order Service │───▶│Payment Service│───▶│Inventory Svc │───▶│ Notification │
│ (MySQL) │ │ (PostgreSQL) │ │ (MongoDB) │ │ (Redis) │
└──────────-────┘ └─────────-─────┘ └──────────-────┘ └───────────-───┘

What happens if Inventory fails after Payment succeeds? You need to refund the payment. That's where the Saga pattern comes in.

Why not 2PC (Two-Phase Commit)? 2PC requires all participants to lock resources until the coordinator commits. This doesn't scale well in microservices — it creates tight coupling, can cause deadlocks, and a single coordinator failure blocks everything.

What is the Saga Pattern?

A Saga is a sequence of local transactions where each transaction updates a single service's database. If any step fails, previously completed steps are undone by executing compensating transactions in reverse order.

Forward flow (happy path):
T1 → T2 → T3 → T4 → ✅ Done

Failure at T3:
T1 → T2 → T3 ❌ → C2 → C1 → ❌ Rolled back

T = Transaction (local)
C = Compensating transaction (undo)

Key Concepts

ConceptDescription
Local transactionA database operation within a single service
Compensating transactionThe "undo" operation for a previously completed step
Saga coordinatorThe component that manages the execution flow
IdempotencyCompensations must be safe to retry without side effects

Two Approaches: Choreography vs Orchestration

There are two main ways to implement the Saga pattern:

Choreography (Event-driven)

Each service publishes events and listens for events from other services. There's no central coordinator — services react to each other's events.

┌─────────────┐   OrderCreated  ┌─────────────┐  PaymentCompleted ┌─────────────┐
│ Order │ ──────────────▶ │ Payment │ ───────────────▶ │ Inventory │
│ Service │ ◀────────────── │ Service │ ◀─────────────── │ Service │
└─────────────┘ PaymentFailed └─────────────┘ InventoryReserved └─────────────┘
InventoryFailed

Java Example — Choreography with Spring + Kafka

Order Service — publishes event after creating order:

@Service
public class OrderService {
private final OrderRepository orderRepository;
private final KafkaTemplate<String, OrderEvent> kafkaTemplate;

public Order createOrder(OrderRequest request) {
Order order = Order.builder()
.customerId(request.getCustomerId())
.amount(request.getAmount())
.status(OrderStatus.PENDING)
.build();

orderRepository.save(order);

// Publish event — Payment Service will pick this up
kafkaTemplate.send("order-events", new OrderCreatedEvent(
order.getId(), order.getCustomerId(), order.getAmount()
));

return order;
}

// Compensating transaction — called when downstream fails
@KafkaListener(topics = "payment-events", groupId = "order-service")
public void handlePaymentFailed(PaymentFailedEvent event) {
Order order = orderRepository.findById(event.getOrderId())
.orElseThrow();
order.setStatus(OrderStatus.CANCELLED);
orderRepository.save(order);
}
}

Payment Service — listens for order events, charges payment:

@Service
public class PaymentService {
private final PaymentRepository paymentRepository;
private final KafkaTemplate<String, PaymentEvent> kafkaTemplate;

@KafkaListener(topics = "order-events", groupId = "payment-service")
public void handleOrderCreated(OrderCreatedEvent event) {
try {
Payment payment = processPayment(event.getOrderId(), event.getAmount());
paymentRepository.save(payment);

// Success → notify next service
kafkaTemplate.send("payment-events", new PaymentCompletedEvent(
event.getOrderId(), payment.getTransactionId()
));
} catch (InsufficientFundsException e) {
// Failure → notify Order Service to compensate
kafkaTemplate.send("payment-events", new PaymentFailedEvent(
event.getOrderId(), e.getMessage()
));
}
}

// Compensating transaction — refund if Inventory fails later
@KafkaListener(topics = "inventory-events", groupId = "payment-service")
public void handleInventoryFailed(InventoryFailedEvent event) {
Payment payment = paymentRepository.findByOrderId(event.getOrderId())
.orElseThrow();
payment.setStatus(PaymentStatus.REFUNDED);
paymentRepository.save(payment);
// Process actual refund...
}
}

Orchestration (Central Coordinator)

A central Saga Orchestrator controls the entire flow — it tells each service what to do and handles failures by calling compensating transactions.

                           ┌───────────────-───┐
│ Saga Orchestrator│
│ (Order Saga) │
└────────┬─────-────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Payment │ │ Inventory │ │Notification│
│ Service │ │ Service │ │ Service │
└────────────┘ └────────────┘ └────────────┘

Java Example — Orchestration with State Machine

@Service
public class OrderSagaOrchestrator {
private final PaymentClient paymentClient;
private final InventoryClient inventoryClient;
private final OrderRepository orderRepository;

@Transactional
public void executeSaga(Order order) {
try {
// Step 1: Charge payment
PaymentResult payment = paymentClient.charge(
order.getId(), order.getAmount()
);
if (!payment.isSuccess()) {
throw new SagaStepException("Payment failed");
}

// Step 2: Reserve inventory
InventoryResult inventory = inventoryClient.reserve(
order.getId(), order.getItems()
);
if (!inventory.isSuccess()) {
// Compensate Step 1
paymentClient.refund(order.getId(), payment.getTransactionId());
throw new SagaStepException("Inventory reservation failed");
}

// Step 3: Confirm order
order.setStatus(OrderStatus.CONFIRMED);
orderRepository.save(order);

} catch (SagaStepException e) {
// Mark order as failed
order.setStatus(OrderStatus.FAILED);
order.setFailureReason(e.getMessage());
orderRepository.save(order);
}
}
}

Advanced: Step-based Saga with Reusable Framework

For complex sagas with many steps, use a step-based approach:

public interface SagaStep<T> {
/** Execute the forward transaction */
void execute(T context);

/** Compensate (undo) if a later step fails */
void compensate(T context);
}

public class PaymentStep implements SagaStep<OrderContext> {
private final PaymentClient paymentClient;

@Override
public void execute(OrderContext ctx) {
PaymentResult result = paymentClient.charge(ctx.getOrderId(), ctx.getAmount());
ctx.setPaymentTransactionId(result.getTransactionId());
}

@Override
public void compensate(OrderContext ctx) {
paymentClient.refund(ctx.getOrderId(), ctx.getPaymentTransactionId());
}
}

public class InventoryStep implements SagaStep<OrderContext> {
private final InventoryClient inventoryClient;

@Override
public void execute(OrderContext ctx) {
inventoryClient.reserve(ctx.getOrderId(), ctx.getItems());
}

@Override
public void compensate(OrderContext ctx) {
inventoryClient.release(ctx.getOrderId(), ctx.getItems());
}
}

The Saga executor runs steps in order and compensates on failure:

public class SagaExecutor<T> {
private final List<SagaStep<T>> steps;

public SagaExecutor(List<SagaStep<T>> steps) {
this.steps = steps;
}

public void execute(T context) {
List<SagaStep<T>> completedSteps = new ArrayList<>();

for (SagaStep<T> step : steps) {
try {
step.execute(context);
completedSteps.add(step);
} catch (Exception e) {
// Compensate in reverse order
Collections.reverse(completedSteps);
for (SagaStep<T> completed : completedSteps) {
try {
completed.compensate(context);
} catch (Exception compensationError) {
// Log and alert — manual intervention needed
log.error("Compensation failed for step: {}",
completed.getClass().getSimpleName(), compensationError);
}
}
throw new SagaFailedException("Saga failed at: " +
step.getClass().getSimpleName(), e);
}
}
}
}

Usage:

SagaExecutor<OrderContext> orderSaga = new SagaExecutor<>(List.of(
new PaymentStep(paymentClient),
new InventoryStep(inventoryClient),
new NotificationStep(notificationClient)
));

orderSaga.execute(new OrderContext(orderId, amount, items));

Choreography vs Orchestration — When to Use What

AspectChoreographyOrchestration
CouplingLoose — services only know about eventsTighter — orchestrator knows all services
ComplexityHard to track flow across servicesEasy to understand — flow is in one place
ScalabilityBetter — no single point of failureOrchestrator can be a bottleneck
DebuggingDifficult — events are scatteredEasier — single place to add logging
Best forSimple sagas (2–3 steps)Complex sagas (4+ steps, conditional logic)
Tech stackKafka, RabbitMQ, Event BusREST/gRPC calls, or message queue + orchestrator

My recommendation: Start with orchestration for most business-critical flows (payments, orders). The traceability and control are worth the slight coupling. Use choreography for loosely-coupled, fire-and-forget flows (notifications, analytics).

Best Practices

1. Make Compensations Idempotent

Compensating transactions may be retried. They must produce the same result regardless of how many times they run.

// ✅ Good — idempotent refund
public void refund(String orderId, String transactionId) {
Payment payment = paymentRepository.findByTransactionId(transactionId);
if (payment.getStatus() == PaymentStatus.REFUNDED) {
return; // Already refunded — safe to skip
}
payment.setStatus(PaymentStatus.REFUNDED);
paymentRepository.save(payment);
paymentGateway.processRefund(transactionId);
}

2. Use a Saga Status Table

Track the state of each saga for debugging and recovery:

@Entity
@Table(name = "saga_log")
public class SagaLog {
@Id
private String sagaId;
private String orderId;
private String currentStep;

@Enumerated(EnumType.STRING)
private SagaStatus status; // STARTED, COMPLETED, COMPENSATING, FAILED

private String failureReason;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;
}

3. Handle Compensation Failures

What if the compensation itself fails? You need a fallback strategy:

  • Retry with backoff — retry the compensation N times with exponential backoff
  • Dead letter queue — push failed compensations to a DLQ for manual intervention
  • Alerting — notify the ops team immediately for financial transactions

4. Set Timeouts

Each saga step should have a timeout. A hanging service should not block the entire saga.

CompletableFuture<PaymentResult> future = CompletableFuture.supplyAsync(
() -> paymentClient.charge(orderId, amount)
);

PaymentResult result = future.get(5, TimeUnit.SECONDS); // Timeout after 5s

Common Pitfalls

PitfallProblemSolution
Non-idempotent compensationsDouble refunds, duplicate notificationsAlways check current state before compensating
Missing compensationsAdded a new step but forgot the undoEnforce SagaStep interface with both execute() and compensate()
No timeoutA hanging service blocks the whole sagaSet timeouts on every external call
No saga logCan't debug or recover from failuresAlways persist saga state to a database
Ignoring compensation failuresSilent data inconsistencyAlert + dead letter queue for manual review

Real-World Use Cases

E-Commerce Order Flow

Create Order → Reserve Inventory → Charge Payment → Ship Order

If payment fails → release inventory → cancel order.

Banking Transfer (My Experience at OCB)

Debit Source Account → Credit Destination Account → Update Transaction Log

If credit fails → reverse debit → mark transaction as failed → alert operations.

E-Wallet Top-Up (MoMo-style)

Verify User → Charge Bank Account → Credit Wallet → Send Receipt

If wallet credit fails → refund bank charge → notify user of failure.

Key Takeaways

  1. Use Saga when you can't use a single database transaction — i.e., when a business operation spans multiple microservices.
  2. Every step must have a compensating transaction — think of each action's "undo" before writing the forward logic.
  3. Orchestration is easier to debug — prefer it for critical business flows like payments and orders.
  4. Choreography scales better — use it for loosely-coupled, non-critical flows.
  5. Idempotency is non-negotiable — both forward and compensating transactions must be safe to retry.
  6. Log everything — persist saga state, step results, and failure reasons for troubleshooting.

When NOT to use Saga: If your services can share a database (same bounded context), just use a regular database transaction. Don't add distributed transaction complexity where it's not needed.


Thanks for reading! If you're building microservices that handle real money, getting the Saga pattern right is critical. Also check out my SOLID Principles post for writing clean service code, and my JVM Memory Tuning guide for production performance. 🚀