Saga Pattern — Managing Distributed Transactions in Microservices
A practical guide to the Saga pattern for managing distributed transactions across microservices — with Java examples, choreography vs orchestration comparison, and real-world use cases from fintech.
In a monolithic application, maintaining data consistency is straightforward — you wrap everything in a single database transaction. But in a microservices architecture, each service owns its own database. A single business operation (like placing an order) can span multiple services, and there's no single @Transactional annotation that can save you.
This is exactly the problem I've dealt with at scale in banking and e-wallet systems, where a failed payment must trigger reversals across multiple services — and where partial failures can mean real money is lost.
The Saga pattern solves this by breaking a distributed transaction into a sequence of local transactions, each with a compensating action for rollback.
📖 Original Saga paper by Hector Garcia-Molina (1987) • Microservices.io — Saga pattern
The Problem: Why Distributed Transactions Are Hard
In a monolith, this is easy:
@Transactional
public void placeOrder(OrderRequest request) {
orderRepository.save(order);
paymentService.charge(order);
inventoryService.reserve(order);
notificationService.notify(order);
// All succeed or all rollback — simple ✅
}
In microservices, each of these is a separate service with its own database:
┌─────────────-─┐ ┌──────────-────┐ ┌──────────-────┐ ┌───────────-───┐
│ Order Service │───▶│Payment Service│───▶│Inventory Svc │───▶│ Notification │
│ (MySQL) │ │ (PostgreSQL) │ │ (MongoDB) │ │ (Redis) │
└──────────-────┘ └─────────-─────┘ └──────────-────┘ └───────────-───┘
What happens if Inventory fails after Payment succeeds? You need to refund the payment. That's where the Saga pattern comes in.
Why not 2PC (Two-Phase Commit)? 2PC requires all participants to lock resources until the coordinator commits. This doesn't scale well in microservices — it creates tight coupling, can cause deadlocks, and a single coordinator failure blocks everything.
What is the Saga Pattern?
A Saga is a sequence of local transactions where each transaction updates a single service's database. If any step fails, previously completed steps are undone by executing compensating transactions in reverse order.
Forward flow (happy path):
T1 → T2 → T3 → T4 → ✅ Done
Failure at T3:
T1 → T2 → T3 ❌ → C2 → C1 → ❌ Rolled back
T = Transaction (local)
C = Compensating transaction (undo)
Key Concepts
| Concept | Description |
|---|---|
| Local transaction | A database operation within a single service |
| Compensating transaction | The "undo" operation for a previously completed step |
| Saga coordinator | The component that manages the execution flow |
| Idempotency | Compensations must be safe to retry without side effects |
Two Approaches: Choreography vs Orchestration
There are two main ways to implement the Saga pattern:
Choreography (Event-driven)
Each service publishes events and listens for events from other services. There's no central coordinator — services react to each other's events.
┌─────────────┐ OrderCreated ┌─────────────┐ PaymentCompleted ┌─────────────┐
│ Order │ ──────────────▶ │ Payment │ ───────────────▶ │ Inventory │
│ Service │ ◀────────────── │ Service │ ◀─────────────── │ Service │
└─────────────┘ PaymentFailed └─────────────┘ InventoryReserved └─────────────┘
InventoryFailed
Java Example — Choreography with Spring + Kafka
Order Service — publishes event after creating order:
@Service
public class OrderService {
private final OrderRepository orderRepository;
private final KafkaTemplate<String, OrderEvent> kafkaTemplate;
public Order createOrder(OrderRequest request) {
Order order = Order.builder()
.customerId(request.getCustomerId())
.amount(request.getAmount())
.status(OrderStatus.PENDING)
.build();
orderRepository.save(order);
// Publish event — Payment Service will pick this up
kafkaTemplate.send("order-events", new OrderCreatedEvent(
order.getId(), order.getCustomerId(), order.getAmount()
));
return order;
}
// Compensating transaction — called when downstream fails
@KafkaListener(topics = "payment-events", groupId = "order-service")
public void handlePaymentFailed(PaymentFailedEvent event) {
Order order = orderRepository.findById(event.getOrderId())
.orElseThrow();
order.setStatus(OrderStatus.CANCELLED);
orderRepository.save(order);
}
}
Payment Service — listens for order events, charges payment:
@Service
public class PaymentService {
private final PaymentRepository paymentRepository;
private final KafkaTemplate<String, PaymentEvent> kafkaTemplate;
@KafkaListener(topics = "order-events", groupId = "payment-service")
public void handleOrderCreated(OrderCreatedEvent event) {
try {
Payment payment = processPayment(event.getOrderId(), event.getAmount());
paymentRepository.save(payment);
// Success → notify next service
kafkaTemplate.send("payment-events", new PaymentCompletedEvent(
event.getOrderId(), payment.getTransactionId()
));
} catch (InsufficientFundsException e) {
// Failure → notify Order Service to compensate
kafkaTemplate.send("payment-events", new PaymentFailedEvent(
event.getOrderId(), e.getMessage()
));
}
}
// Compensating transaction — refund if Inventory fails later
@KafkaListener(topics = "inventory-events", groupId = "payment-service")
public void handleInventoryFailed(InventoryFailedEvent event) {
Payment payment = paymentRepository.findByOrderId(event.getOrderId())
.orElseThrow();
payment.setStatus(PaymentStatus.REFUNDED);
paymentRepository.save(payment);
// Process actual refund...
}
}
Orchestration (Central Coordinator)
A central Saga Orchestrator controls the entire flow — it tells each service what to do and handles failures by calling compensating transactions.
┌───────────────-───┐
│ Saga Orchestrator│
│ (Order Saga) │
└────────┬─────-────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Payment │ │ Inventory │ │Notification│
│ Service │ │ Service │ │ Service │
└────────────┘ └────────────┘ └────────────┘
Java Example — Orchestration with State Machine
@Service
public class OrderSagaOrchestrator {
private final PaymentClient paymentClient;
private final InventoryClient inventoryClient;
private final OrderRepository orderRepository;
@Transactional
public void executeSaga(Order order) {
try {
// Step 1: Charge payment
PaymentResult payment = paymentClient.charge(
order.getId(), order.getAmount()
);
if (!payment.isSuccess()) {
throw new SagaStepException("Payment failed");
}
// Step 2: Reserve inventory
InventoryResult inventory = inventoryClient.reserve(
order.getId(), order.getItems()
);
if (!inventory.isSuccess()) {
// Compensate Step 1
paymentClient.refund(order.getId(), payment.getTransactionId());
throw new SagaStepException("Inventory reservation failed");
}
// Step 3: Confirm order
order.setStatus(OrderStatus.CONFIRMED);
orderRepository.save(order);
} catch (SagaStepException e) {
// Mark order as failed
order.setStatus(OrderStatus.FAILED);
order.setFailureReason(e.getMessage());
orderRepository.save(order);
}
}
}
Advanced: Step-based Saga with Reusable Framework
For complex sagas with many steps, use a step-based approach:
public interface SagaStep<T> {
/** Execute the forward transaction */
void execute(T context);
/** Compensate (undo) if a later step fails */
void compensate(T context);
}
public class PaymentStep implements SagaStep<OrderContext> {
private final PaymentClient paymentClient;
@Override
public void execute(OrderContext ctx) {
PaymentResult result = paymentClient.charge(ctx.getOrderId(), ctx.getAmount());
ctx.setPaymentTransactionId(result.getTransactionId());
}
@Override
public void compensate(OrderContext ctx) {
paymentClient.refund(ctx.getOrderId(), ctx.getPaymentTransactionId());
}
}
public class InventoryStep implements SagaStep<OrderContext> {
private final InventoryClient inventoryClient;
@Override
public void execute(OrderContext ctx) {
inventoryClient.reserve(ctx.getOrderId(), ctx.getItems());
}
@Override
public void compensate(OrderContext ctx) {
inventoryClient.release(ctx.getOrderId(), ctx.getItems());
}
}
The Saga executor runs steps in order and compensates on failure:
public class SagaExecutor<T> {
private final List<SagaStep<T>> steps;
public SagaExecutor(List<SagaStep<T>> steps) {
this.steps = steps;
}
public void execute(T context) {
List<SagaStep<T>> completedSteps = new ArrayList<>();
for (SagaStep<T> step : steps) {
try {
step.execute(context);
completedSteps.add(step);
} catch (Exception e) {
// Compensate in reverse order
Collections.reverse(completedSteps);
for (SagaStep<T> completed : completedSteps) {
try {
completed.compensate(context);
} catch (Exception compensationError) {
// Log and alert — manual intervention needed
log.error("Compensation failed for step: {}",
completed.getClass().getSimpleName(), compensationError);
}
}
throw new SagaFailedException("Saga failed at: " +
step.getClass().getSimpleName(), e);
}
}
}
}
Usage:
SagaExecutor<OrderContext> orderSaga = new SagaExecutor<>(List.of(
new PaymentStep(paymentClient),
new InventoryStep(inventoryClient),
new NotificationStep(notificationClient)
));
orderSaga.execute(new OrderContext(orderId, amount, items));
Choreography vs Orchestration — When to Use What
| Aspect | Choreography | Orchestration |
|---|---|---|
| Coupling | Loose — services only know about events | Tighter — orchestrator knows all services |
| Complexity | Hard to track flow across services | Easy to understand — flow is in one place |
| Scalability | Better — no single point of failure | Orchestrator can be a bottleneck |
| Debugging | Difficult — events are scattered | Easier — single place to add logging |
| Best for | Simple sagas (2–3 steps) | Complex sagas (4+ steps, conditional logic) |
| Tech stack | Kafka, RabbitMQ, Event Bus | REST/gRPC calls, or message queue + orchestrator |
My recommendation: Start with orchestration for most business-critical flows (payments, orders). The traceability and control are worth the slight coupling. Use choreography for loosely-coupled, fire-and-forget flows (notifications, analytics).
Best Practices
1. Make Compensations Idempotent
Compensating transactions may be retried. They must produce the same result regardless of how many times they run.
// ✅ Good — idempotent refund
public void refund(String orderId, String transactionId) {
Payment payment = paymentRepository.findByTransactionId(transactionId);
if (payment.getStatus() == PaymentStatus.REFUNDED) {
return; // Already refunded — safe to skip
}
payment.setStatus(PaymentStatus.REFUNDED);
paymentRepository.save(payment);
paymentGateway.processRefund(transactionId);
}
2. Use a Saga Status Table
Track the state of each saga for debugging and recovery:
@Entity
@Table(name = "saga_log")
public class SagaLog {
@Id
private String sagaId;
private String orderId;
private String currentStep;
@Enumerated(EnumType.STRING)
private SagaStatus status; // STARTED, COMPLETED, COMPENSATING, FAILED
private String failureReason;
private LocalDateTime createdAt;
private LocalDateTime updatedAt;
}
3. Handle Compensation Failures
What if the compensation itself fails? You need a fallback strategy:
- Retry with backoff — retry the compensation N times with exponential backoff
- Dead letter queue — push failed compensations to a DLQ for manual intervention
- Alerting — notify the ops team immediately for financial transactions
4. Set Timeouts
Each saga step should have a timeout. A hanging service should not block the entire saga.
CompletableFuture<PaymentResult> future = CompletableFuture.supplyAsync(
() -> paymentClient.charge(orderId, amount)
);
PaymentResult result = future.get(5, TimeUnit.SECONDS); // Timeout after 5s
Common Pitfalls
| Pitfall | Problem | Solution |
|---|---|---|
| Non-idempotent compensations | Double refunds, duplicate notifications | Always check current state before compensating |
| Missing compensations | Added a new step but forgot the undo | Enforce SagaStep interface with both execute() and compensate() |
| No timeout | A hanging service blocks the whole saga | Set timeouts on every external call |
| No saga log | Can't debug or recover from failures | Always persist saga state to a database |
| Ignoring compensation failures | Silent data inconsistency | Alert + dead letter queue for manual review |
Real-World Use Cases
E-Commerce Order Flow
Create Order → Reserve Inventory → Charge Payment → Ship Order
If payment fails → release inventory → cancel order.
Banking Transfer (My Experience at OCB)
Debit Source Account → Credit Destination Account → Update Transaction Log
If credit fails → reverse debit → mark transaction as failed → alert operations.
E-Wallet Top-Up (MoMo-style)
Verify User → Charge Bank Account → Credit Wallet → Send Receipt
If wallet credit fails → refund bank charge → notify user of failure.
Key Takeaways
- Use Saga when you can't use a single database transaction — i.e., when a business operation spans multiple microservices.
- Every step must have a compensating transaction — think of each action's "undo" before writing the forward logic.
- Orchestration is easier to debug — prefer it for critical business flows like payments and orders.
- Choreography scales better — use it for loosely-coupled, non-critical flows.
- Idempotency is non-negotiable — both forward and compensating transactions must be safe to retry.
- Log everything — persist saga state, step results, and failure reasons for troubleshooting.
When NOT to use Saga: If your services can share a database (same bounded context), just use a regular database transaction. Don't add distributed transaction complexity where it's not needed.
Thanks for reading! If you're building microservices that handle real money, getting the Saga pattern right is critical. Also check out my SOLID Principles post for writing clean service code, and my JVM Memory Tuning guide for production performance. 🚀