Site Reliability Engineering With Prometheus For Fun and Profit


Details
What could be better than software that automates lessons of an influential text? The book that captures years of hard-won lessons gleaned by Google's own 5,000+ person strong Site Reliability Engineering Organization:
https://landing.google.com/sre/books/
At our next meeting, Eddy Reyes will present SRE concepts of Service Level Objectives and Error Budgets to design an alerting system that only wakes you up at 3AM for problems that matter.
Once you have this information, we will discuss the necessary process and cultural components your team must have in place to ensure that your monitoring system perfectly reflects your customers' pain thresholds and the integrity of the data your team orients its engineering process around. We will also go over tooling options, sample solutions using Prometheus, and next step improvements to enhance observability of your systems for better problem determination.
Bio:
Eddy Reyes has spent his career building tools to make software easier to write. He has been a kernel hacker, a web developer, an infosec developer, and everything in between. He is the co-founder of Mindsight, a company that is dedicated to helping teams achieve Site Reliability Engineering practices and better serve their customers.
----
Eddie's company is generously sponsoring Pizza and soft drinks, so be sure to show up with a healthy appetite!

Site Reliability Engineering With Prometheus For Fun and Profit