SRE MUC Summer 2025 Edition


Details
This time around we will be meeting at SQUER over in the House of communications at Ostbahnhof. Let's meet up, talk about reliability, exchange ideas and see where we can continue to learn on our journey as site reliability engineers (and folks that aspire to be one!).
Meetups are about engaging within the community, so we are looking to everyone to share ideas and learn to ultimately to reduce the risk of disasters. You can reach the organizers at muc@sre.xyz (Ideas, Presentations, Comments).
Please help to spread The Word! Feel free to share this event on social media using the #sremuc hashtag!
Agenda:
6:00 pm Get together with food and drinks
7:00 pm Welcome to SREmuc
Talk 1: From Alerts to Actions: Smarter Kubernetes Fixes with Slack, AI & GitOps
Talk 2: From AZs to the Internet: Cracking the code on AWS networking costs
Discussion: Learning from other emergency response services
9:00 pm Networking + Drinks
9:30 pm Leave happy and inspired :)
***
Speakers & Abstract:
Talk 1: From Alerts to Actions: Smarter Kubernetes Fixes with Slack, AI & GitOps -
Ankit Asthana & Tom Graupner / SQUER
In this talk, we’ll showcase how Slack can evolve from a passive alerting channel into an intelligent control plane for Kubernetes incident response. You'll see two live demos: one where a Grafana alert triggers AI-powered root cause analysis using logs, events, and manifests — with example action buttons (PR, Jira, scan) shown as extendable paths beyond the RCA. The second demo features a Slack bot command (`/agentic-scan`) that initiates a full cluster scan using K8sGPT followed by AI-prioritized fixes and a GitHub PR via Kyverno. This approach combines AI, GitOps, and ChatOps to streamline RCA, shorten MTTR, and keep human judgment in the loop where it matters most.
You can reach the speakers over on LinkedIn
Talk 2: From AZs to the Internet: Cracking the code on AWS networking costs - Akshay Kapoor / AWS
In this talk, I’ll explain what really drives networking costs on AWS in simple terms. I'll share a real-world case study where specific application design decisions had a direct impact on costs, and how targeted optimizations led to significant savings. Along the way, I’ll share practical insights and lessons learned, helping you understand the trade-offs of various design options so you can make informed, cost-effective decisions with your own AWS environments.
Speaker Bio
Akshay is a cloud architect with years of experience delivering simple solutions for complex business problems. He loves learning new things and sharing his knowledge. At Amazon Web Services (AWS), he acts as a trusted advisor, helping businesses use AWS tools in creative ways that connect their technical plans with their business goals. Akshay is also the author of AWS DevOps Simplified, where he shares his practical insights on using AWS and DevOps to make technology easier for everyone.
You can reach the speaker over on LinkedIn: https://www.linkedin.com/in/akskap/
Discussion: Learning from other emergency response services - Ingo Averdunk / IBM
Ingo Averdunk is proposing a local working group to draw parallels between other professional incident response teams like firefighters and what we can learn from them for our implementations of SRE.
You can reach the speaker over on LinkedIn: https://www.linkedin.com/in/ingoaverdunk/

SRE MUC Summer 2025 Edition