Saga Pattern — Managing Distributed Transactions in Microservices

April 1, 2026 · 8 min read

Senior Software Engineer at OCB

A practical guide to the Saga pattern for managing distributed transactions across microservices — with Java examples, choreography vs orchestration comparison, and real-world use cases from fintech.

In a monolithic application, maintaining data consistency is straightforward — you wrap everything in a single database transaction. But in a microservices architecture, each service owns its own database. A single business operation (like placing an order) can span multiple services, and there's no single @Transactional annotation that can save you.

This is exactly the problem I've dealt with at scale in banking and e-wallet systems, where a failed payment must trigger reversals across multiple services — and where partial failures can mean real money is lost.

The Saga pattern solves this by breaking a distributed transaction into a sequence of local transactions, each with a compensating action for rollback.

📖 Original Saga paper by Hector Garcia-Molina (1987) • Microservices.io — Saga pattern

The Problem: Why Distributed Transactions Are Hard

In a monolith, this is easy:

@Transactional
public void placeOrder(OrderRequest request) {
    orderRepository.save(order);
    paymentService.charge(order);
    inventoryService.reserve(order);
    notificationService.notify(order);
    // All succeed or all rollback — simple ✅
}

In microservices, each of these is a separate service with its own database:

┌─────────────-─┐    ┌──────────-────┐    ┌──────────-────┐    ┌───────────-───┐
│ Order Service │───▶│Payment Service│───▶│Inventory Svc  │───▶│ Notification  │
│   (MySQL)     │    │  (PostgreSQL) │    │   (MongoDB)   │    │   (Redis)     │
└──────────-────┘    └─────────-─────┘    └──────────-────┘    └───────────-───┘

What happens if Inventory fails after Payment succeeds? You need to refund the payment. That's where the Saga pattern comes in.

Why not 2PC (Two-Phase Commit)? 2PC requires all participants to lock resources until the coordinator commits. This doesn't scale well in microservices — it creates tight coupling, can cause deadlocks, and a single coordinator failure blocks everything.

What is the Saga Pattern?

A Saga is a sequence of local transactions where each transaction updates a single service's database. If any step fails, previously completed steps are undone by executing compensating transactions in reverse order.

Forward flow (happy path):
T1 → T2 → T3 → T4 → ✅ Done

Failure at T3:
T1 → T2 → T3 ❌ → C2 → C1 → ❌ Rolled back

T = Transaction (local)
C = Compensating transaction (undo)

Key Concepts

Concept	Description
Local transaction	A database operation within a single service
Compensating transaction	The "undo" operation for a previously completed step
Saga coordinator	The component that manages the execution flow
Idempotency	Compensations must be safe to retry without side effects

Two Approaches: Choreography vs Orchestration

There are two main ways to implement the Saga pattern:

Choreography (Event-driven)

Each service publishes events and listens for events from other services. There's no central coordinator — services react to each other's events.

┌─────────────┐   OrderCreated  ┌─────────────┐  PaymentCompleted ┌─────────────┐
│   Order     │ ──────────────▶ │   Payment   │ ───────────────▶  │  Inventory  │
│   Service   │ ◀────────────── │   Service   │ ◀───────────────  │   Service   │
└─────────────┘   PaymentFailed └─────────────┘ InventoryReserved └─────────────┘
                                                  InventoryFailed

Java Example — Choreography with Spring + Kafka

Order Service — publishes event after creating order:

@Service
public class OrderService {
    private final OrderRepository orderRepository;
    private final KafkaTemplate<String, OrderEvent> kafkaTemplate;

    public Order createOrder(OrderRequest request) {
        Order order = Order.builder()
            .customerId(request.getCustomerId())
            .amount(request.getAmount())
            .status(OrderStatus.PENDING)
            .build();

        orderRepository.save(order);

        // Publish event — Payment Service will pick this up
        kafkaTemplate.send("order-events", new OrderCreatedEvent(
            order.getId(), order.getCustomerId(), order.getAmount()
        ));

        return order;
    }

    // Compensating transaction — called when downstream fails
    @KafkaListener(topics = "payment-events", groupId = "order-service")
    public void handlePaymentFailed(PaymentFailedEvent event) {
        Order order = orderRepository.findById(event.getOrderId())
            .orElseThrow();
        order.setStatus(OrderStatus.CANCELLED);
        orderRepository.save(order);
    }
}

Payment Service — listens for order events, charges payment:

@Service
public class PaymentService {
    private final PaymentRepository paymentRepository;
    private final KafkaTemplate<String, PaymentEvent> kafkaTemplate;

    @KafkaListener(topics = "order-events", groupId = "payment-service")
    public void handleOrderCreated(OrderCreatedEvent event) {
        try {
            Payment payment = processPayment(event.getOrderId(), event.getAmount());
            paymentRepository.save(payment);

            // Success → notify next service
            kafkaTemplate.send("payment-events", new PaymentCompletedEvent(
                event.getOrderId(), payment.getTransactionId()
            ));
        } catch (InsufficientFundsException e) {
            // Failure → notify Order Service to compensate
            kafkaTemplate.send("payment-events", new PaymentFailedEvent(
                event.getOrderId(), e.getMessage()
            ));
        }
    }

    // Compensating transaction — refund if Inventory fails later
    @KafkaListener(topics = "inventory-events", groupId = "payment-service")
    public void handleInventoryFailed(InventoryFailedEvent event) {
        Payment payment = paymentRepository.findByOrderId(event.getOrderId())
            .orElseThrow();
        payment.setStatus(PaymentStatus.REFUNDED);
        paymentRepository.save(payment);
        // Process actual refund...
    }
}

Orchestration (Central Coordinator)

A central Saga Orchestrator controls the entire flow — it tells each service what to do and handles failures by calling compensating transactions.

                           ┌───────────────-───┐
                           │  Saga Orchestrator│
                           │  (Order Saga)     │
                           └────────┬─────-────┘
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
             ┌────────────┐ ┌────────────┐ ┌────────────┐
             │  Payment   │ │ Inventory  │ │Notification│
             │  Service   │ │  Service   │ │  Service   │
             └────────────┘ └────────────┘ └────────────┘

Java Example — Orchestration with State Machine

@Service
public class OrderSagaOrchestrator {
    private final PaymentClient paymentClient;
    private final InventoryClient inventoryClient;
    private final OrderRepository orderRepository;

    @Transactional
    public void executeSaga(Order order) {
        try {
            // Step 1: Charge payment
            PaymentResult payment = paymentClient.charge(
                order.getId(), order.getAmount()
            );
            if (!payment.isSuccess()) {
                throw new SagaStepException("Payment failed");
            }

            // Step 2: Reserve inventory
            InventoryResult inventory = inventoryClient.reserve(
                order.getId(), order.getItems()
            );
            if (!inventory.isSuccess()) {
                // Compensate Step 1
                paymentClient.refund(order.getId(), payment.getTransactionId());
                throw new SagaStepException("Inventory reservation failed");
            }

            // Step 3: Confirm order
            order.setStatus(OrderStatus.CONFIRMED);
            orderRepository.save(order);

        } catch (SagaStepException e) {
            // Mark order as failed
            order.setStatus(OrderStatus.FAILED);
            order.setFailureReason(e.getMessage());
            orderRepository.save(order);
        }
    }
}

Advanced: Step-based Saga with Reusable Framework

For complex sagas with many steps, use a step-based approach:

public interface SagaStep<T> {
    /** Execute the forward transaction */
    void execute(T context);

    /** Compensate (undo) if a later step fails */
    void compensate(T context);
}

public class PaymentStep implements SagaStep<OrderContext> {
    private final PaymentClient paymentClient;

    @Override
    public void execute(OrderContext ctx) {
        PaymentResult result = paymentClient.charge(ctx.getOrderId(), ctx.getAmount());
        ctx.setPaymentTransactionId(result.getTransactionId());
    }

    @Override
    public void compensate(OrderContext ctx) {
        paymentClient.refund(ctx.getOrderId(), ctx.getPaymentTransactionId());
    }
}

public class InventoryStep implements SagaStep<OrderContext> {
    private final InventoryClient inventoryClient;

    @Override
    public void execute(OrderContext ctx) {
        inventoryClient.reserve(ctx.getOrderId(), ctx.getItems());
    }

    @Override
    public void compensate(OrderContext ctx) {
        inventoryClient.release(ctx.getOrderId(), ctx.getItems());
    }
}

The Saga executor runs steps in order and compensates on failure:

public class SagaExecutor<T> {
    private final List<SagaStep<T>> steps;

    public SagaExecutor(List<SagaStep<T>> steps) {
        this.steps = steps;
    }

    public void execute(T context) {
        List<SagaStep<T>> completedSteps = new ArrayList<>();

        for (SagaStep<T> step : steps) {
            try {
                step.execute(context);
                completedSteps.add(step);
            } catch (Exception e) {
                // Compensate in reverse order
                Collections.reverse(completedSteps);
                for (SagaStep<T> completed : completedSteps) {
                    try {
                        completed.compensate(context);
                    } catch (Exception compensationError) {
                        // Log and alert — manual intervention needed
                        log.error("Compensation failed for step: {}", 
                            completed.getClass().getSimpleName(), compensationError);
                    }
                }
                throw new SagaFailedException("Saga failed at: " + 
                    step.getClass().getSimpleName(), e);
            }
        }
    }
}

Usage:

SagaExecutor<OrderContext> orderSaga = new SagaExecutor<>(List.of(
    new PaymentStep(paymentClient),
    new InventoryStep(inventoryClient),
    new NotificationStep(notificationClient)
));

orderSaga.execute(new OrderContext(orderId, amount, items));

Choreography vs Orchestration — When to Use What

Aspect	Choreography	Orchestration
Coupling	Loose — services only know about events	Tighter — orchestrator knows all services
Complexity	Hard to track flow across services	Easy to understand — flow is in one place
Scalability	Better — no single point of failure	Orchestrator can be a bottleneck
Debugging	Difficult — events are scattered	Easier — single place to add logging
Best for	Simple sagas (2–3 steps)	Complex sagas (4+ steps, conditional logic)
Tech stack	Kafka, RabbitMQ, Event Bus	REST/gRPC calls, or message queue + orchestrator

My recommendation: Start with orchestration for most business-critical flows (payments, orders). The traceability and control are worth the slight coupling. Use choreography for loosely-coupled, fire-and-forget flows (notifications, analytics).

Best Practices

1. Make Compensations Idempotent

Compensating transactions may be retried. They must produce the same result regardless of how many times they run.

// ✅ Good — idempotent refund
public void refund(String orderId, String transactionId) {
    Payment payment = paymentRepository.findByTransactionId(transactionId);
    if (payment.getStatus() == PaymentStatus.REFUNDED) {
        return; // Already refunded — safe to skip
    }
    payment.setStatus(PaymentStatus.REFUNDED);
    paymentRepository.save(payment);
    paymentGateway.processRefund(transactionId);
}

2. Use a Saga Status Table

Track the state of each saga for debugging and recovery:

@Entity
@Table(name = "saga_log")
public class SagaLog {
    @Id
    private String sagaId;
    private String orderId;
    private String currentStep;

    @Enumerated(EnumType.STRING)
    private SagaStatus status; // STARTED, COMPLETED, COMPENSATING, FAILED

    private String failureReason;
    private LocalDateTime createdAt;
    private LocalDateTime updatedAt;
}

3. Handle Compensation Failures

What if the compensation itself fails? You need a fallback strategy:

Retry with backoff — retry the compensation N times with exponential backoff
Dead letter queue — push failed compensations to a DLQ for manual intervention
Alerting — notify the ops team immediately for financial transactions

4. Set Timeouts

Each saga step should have a timeout. A hanging service should not block the entire saga.

CompletableFuture<PaymentResult> future = CompletableFuture.supplyAsync(
    () -> paymentClient.charge(orderId, amount)
);

PaymentResult result = future.get(5, TimeUnit.SECONDS); // Timeout after 5s

Common Pitfalls

Pitfall	Problem	Solution
Non-idempotent compensations	Double refunds, duplicate notifications	Always check current state before compensating
Missing compensations	Added a new step but forgot the undo	Enforce `SagaStep` interface with both `execute()` and `compensate()`
No timeout	A hanging service blocks the whole saga	Set timeouts on every external call
No saga log	Can't debug or recover from failures	Always persist saga state to a database
Ignoring compensation failures	Silent data inconsistency	Alert + dead letter queue for manual review

Real-World Use Cases

E-Commerce Order Flow

Create Order → Reserve Inventory → Charge Payment → Ship Order

If payment fails → release inventory → cancel order.

Banking Transfer (Real-World System)

Debit Source Account → Credit Destination Account → Update Transaction Log

If credit fails → reverse debit → mark transaction as failed → alert operations.

E-Wallet Top-Up (MoMo-style)

Verify User → Charge Bank Account → Credit Wallet → Send Receipt

If wallet credit fails → refund bank charge → notify user of failure.

Key Takeaways

Use Saga when you can't use a single database transaction — i.e., when a business operation spans multiple microservices.
Every step must have a compensating transaction — think of each action's "undo" before writing the forward logic.
Orchestration is easier to debug — prefer it for critical business flows like payments and orders.
Choreography scales better — use it for loosely-coupled, non-critical flows.
Idempotency is non-negotiable — both forward and compensating transactions must be safe to retry.
Log everything — persist saga state, step results, and failure reasons for troubleshooting.

When NOT to use Saga: If your services can share a database (same bounded context), just use a regular database transaction. Don't add distributed transaction complexity where it's not needed.

Thanks for reading! If you're building microservices that handle real money, getting the Saga pattern right is critical. Also check out my SOLID Principles post for writing clean service code, and my JVM Memory Tuning guide for production performance. 🚀

The Problem: Why Distributed Transactions Are Hard​

What is the Saga Pattern?​

Key Concepts​

Two Approaches: Choreography vs Orchestration​

Choreography (Event-driven)​

Java Example — Choreography with Spring + Kafka​

Orchestration (Central Coordinator)​

Java Example — Orchestration with State Machine​

Advanced: Step-based Saga with Reusable Framework​

Choreography vs Orchestration — When to Use What​

Best Practices​

1. Make Compensations Idempotent​

2. Use a Saga Status Table​

3. Handle Compensation Failures​

4. Set Timeouts​

Common Pitfalls​

Real-World Use Cases​

E-Commerce Order Flow​

Banking Transfer (Real-World System)​

E-Wallet Top-Up (MoMo-style)​

Key Takeaways​