Automating Reliability: Immutable Infrastructure & Agentic Incident Management
Details
Hi Everyone,
Here we are, back again with yet another edition of Expert Talks in Chennai. We are thrilled to host you on 8th November 2025 at VAIGAI Banquet Hall (Ground floor) Four Points by Sheraton Chennai OMR · Rajiv Gandhi Salai, Kumaran Nagar, Sholinganallur, Chennai - 600119. Please join us from 10:30AM to 1:00PM.
Event details are as follows:
Agenda :
1. Welcome & Intros (10 mins)
2. Talk 1: Immutable Infrastructure For Dummies (45 mins)
3. Break (15 mins)
4. Talk 2: Agentic Incident Management (45 mins)
5. Tea & Networking
Talk 1: Immutable Infrastructure For Dummies
Many teams run workloads that cannot be run inside containers. These teams often run into the classic problem of “it worked on my machine” – developers run a bunch of services & dependencies to build applications, but when they’re deployed, they error out in ways that are not understood by forward deployment teams, and take time to understand, root-cause and fix. This talk will walk attendees through why immutable infrastructure is important, what kind of teams can benefit from them and provide a live example of a technique that GP developed at a data startup in the past that ran a fleet of 600 to 1200 EC2 instances, all with the same immutable infrastructure guarantee.
Outline
- Introduction to Immutable Infrastructure
- Why Immutable Infrastructure Matters
- Who benefits from this, and when to use this
- When not to use this technique
- How to build Immutable Infrastructure
- Wrap-up + Q&A
Talk 2: Agentic Incident Management
This talk introduces the AI‑Powered Incident Response Coordinator, a prototype multi-agent system that detects, triages, and remediates critical incidents end-to-end. It integrates with existing observability and on-call tools, enforces standardized runbooks, and automates fixes while maintaining a full audit trail - reducing MTTR, minimizing operational toil, and delivering measurable ROI through improved reliability and reduced downtime.
Outline
- Problem & KPIs
- Solution Overview
- Architecture Snapshot
- CrewAI + LangGraph Orchestration
- Live Demo
- Observability & Metrics
- Security & Compliance
- Q&A & Next Steps