Skip to content

Details

🕒 3-Hour Agenda

| Segment | Duration | Description |
| ------- | -------- | ----------- |
| 1. Intro & Context | 15 min | Overview of Google-scale production challenges—scale, complexity, and impact |
| 2. SRE Fundamentals | 30 min | Error budgets, SLIs/SLOs, and reliability culture ([cloud.google.com](https://cloud.google.com/sre/?utm_source=chatgpt.com "Site Reliability Engineering (SRE) |
| 3. CI/CD at Scale | 30 min | Google's Rapid + Blaze build system, automated testing, safe rollouts |
| 4. Canary Deployments & Metrics | 35 min | Selecting effective SLIs for canaries, gradual rollouts |
| ☕ Break | 10 min | — |
| 5. Resilience Engineering | 30 min | Chaos engineering (DiRT), failure drills, playbook-driven incident response |
| 6. Observability & Monitoring | 30 min | Monitoring tiers, alert testing, instrumenting SLIs, dashboards |
| 7. Icebreaker Lab: Build a “Mini SRE Flow” | 25 min | Design a simplified CI-canary-monitor-playbook system in breakout groups |
| 8. Wrap-Up & Q&A | 20 min | Share key tools, patterns, and next steps |

|

## 🔍 Important Session Highlights

### ✅ 1. SRE Culture & Reliability

  • Understand reliability as governed by SLIs/SLOs and error budgets, avoiding unreliable “perfectionism”

### ✅ 2. CI/CD Infrastructure at Google

  • Learn how Rapid + Blaze enables thousands of concurrent builds, tests, and deployments with repeatable, automated release processes

### ✅ 3. Canary Deployment Best Practices

  • Selecting effective canary SLIs (error rates, latency, resource usage) and avoiding common pitfalls in rollout strategies

### ✅ 4. Race-Proven Resilience Engineering

  • Techniques like Disaster Recovery Testing (DiRT) and automated chaos drills build real production readiness

### ✅ 5. Production-Grade Observability

  • Ensuring robust monitoring pipelines, alert governance, and testing alerts proactively rather than reactively

***

## 🛠️ Hands-On Lab Concept

  • Participants group to sketch a simplified SRE pipeline:
  1. CI → 2. Canary deployment → 3. Monitoring (SLIs) → 4. Incident & rollback playbook
  • Groups present basic flowcharts + discuss monitoring thresholds and incident triggers

***

## 🎯 Why This Will Thrill Your Audience

  • Delivers battle-tested strategies from Google’s SRE revelations
  • Balances theory and practice with interactive labs
  • Equips attendees to apply real resiliency tools and design patterns
  • Ideal for practitioners aiming at scale, reliability, and velocity

Join Zoom Meeting

[https://us02web.zoom.us/j/82496056794?pwd=AnGS7lBOP0HXSrkCjbXrblgrPKUiGU.1](https://www.google.com/url?q=https://us02web.zoom.us/j/82496056794?pwd%3DAnGS7lBOP0HXSrkCjbXrblgrPKUiGU.1&sa=D&source=calendar&usd=2&usg=AOvVaw29ldtDsclXi6uZv4Y-I6ej)

Meeting ID: 824 9605 6794
Passcode: 002921

Related topics

Cloud Computing
Golang
DevOps
Google
Google Developer Group

You may also like