This Meetup features talks by J Paul Reed and James Cunningham.
Doors open at 6:30pm. Catch up with other quantifiers over food and drinks. Talks start at 7:00pm and end at 8pm. Space is limited, please RSVP.
This event will be live-streamed at heavybit.com/live/ (http://www.heavybit.com/live/).
Detecting Whispers in Chaos
J Paul Reed (https://twitter.com/@jpaulreed), Managing Partner at Release Engineering Approaches (http://release-approaches.com/)
In this talk, we'll look at what decades of research in the safety sciences has to say about humans interacting with and operating complex socio-technical systems, including what air craft carriers have to do with Internet infrastructure operations, how resilience engineering can help us, and the use of heuristics in incident response. All of these provide insight into ways we can improve one the most advanced—and most effective—monitoring tools we have available to keep those systems running: ourselves.
Learn more about Paul here: http://jpaulreed.com/
Vetting your Pager
James Cunningham (https://twitter.com/JTCunning), Operations Engineer at Sentry (https://sentry.io)
Sentry (sentry.io) receives a million requests a minute to process and store crashes from all around the world. It's the Operations Team's responsibility that everything goes right, but it's also their responsibility to not burn themselves out when things go wrong.
Sentry collects fifty thousand custom metrics inside of DataDog, but only alerts on less than fifty of them. James leads Sentry's observability initiative, creating and maintaining those alerts.
Learn about the lifecycle of an alert at Sentry, including:
• How a variety of metrics are collected efficiently
• How Sentry justifies a metric's degree of accuracy
• Why a metric's logical purpose is defined
• How alerts evolve from metrics, articulating its existence
• When an Engineer actually gets paged and what they're instructed to do