From Chaos to Control: Reproducible Failure Testing in Microservices
Details
IMPORTANT: Participants must register through the registration link
https://sahaj.ai/events/from-chaos-to-control-reproducible-failure-testing-in-microservices-2/
Shortlisted candidates would receive a confirmation a few days prior to the event. As we have limited seats, registration is mandatory, and we recommend registering soon.
## Deterministic Simulation for Distributed Systems
> “It happened once. We couldn’t reproduce it. We fixed what we think was the issue.”
What if you could recreate that exact failure… step by step… and debug it with confidence?
In this hands-on session, we will start by building and running a deterministic simulation of a distributed system, where you can control time, message ordering, retries, and failures—and replay the same incident until you truly understand what’s going on.
This problem is at the heart of modern microservices.
The most serious failures rarely come from obvious bugs. They emerge from subtle interactions between timing, retries, partial failures, and networks. These incidents are rare, fleeting, and often disappear when we try to observe them. In distributed systems, they’re called Heisenbugs—not because they’re imaginary, but because they resist repeatable observation.
Microservices inherit this failure profile even when individual services are simple and well-tested. As systems become more asynchronous and failure-aware, the space of possible behaviours explodes.
Chaos engineering encourages us to inject failures and observe outcomes, but chaos is inherently non-deterministic. When something breaks, we usually can’t replay the exact sequence that caused it, leaving teams to reason probabilistically and ship fixes with limited confidence.
## What You’ll Do in This Session
Through a guided simulation, you will:
- Reproduce a realistic distributed system failure
- Control and manipulate time and event ordering
- Experiment with retries and failure scenarios
- Replay the same scenario multiple times to validate fixes
## Core Idea
This session doesn’t introduce a new framework or silver bullet. Instead, it introduces an idea:
👉 Applying deterministic simulation techniques from distributed systems research to microservices.
The goal isn’t to replace chaos engineering, but to complement it. By making rare failures reproducible, deterministic simulation changes how we debug systems, reason about correctness, and validate fixes.
## What You’ll Walk Away With
- A strong mental model for debugging distributed systems
- Practical intuition on making failures reproducible
- New design questions to improve how systems are tested and evolved
## Speaker
Kalarani Lakshmanan
Solution Consultant at Sahaj Software
With 18+ years of experience designing and delivering scalable cloud and on-prem solutions across greenfield and legacy systems.
## Event Details
- Date: Saturday, April 18th, 2026
- Time: 10:30 AM – 12:30 PM (IST)
- Location: Sahaj Software Hyderabad Office
- Format: Hands-on session
## Who Should Attend
- 5+ years of hands-on coding experience
- Strong fundamentals in Distributed Systems & Microservices
- Interest in building reliable and testable systems
## Prerequisites
- Laptop
- IntelliJ IDEA / VS Code
- Docker (Docker Desktop / Podman / Rancher / Colima)
## Important Registration Notice
- All registrations will be waitlisted initially
- Only official confirmation email guarantees entry
- Meetup/platform emails are not confirmation
