Catching SNAFUs & Learning from Incidents


Details
Please note, unfortunately Richard Cook will no longer be able to join us for this talk, updated description below! See you Tuesday!
What is New Relic learning from deeply studying incidents? How are we extending our capacity to learn and reflect on the challenges of operating at scale? Join us at our next FutureTalks event to hear from Beth Long and Tim Tischler, who will discuss some of the work they’ve done together over the past year as part of the SNAFUcatchers project, “a consortium of industry leaders and researchers united in the common cause of understanding and coping with the immense levels of complexity involved in the operation of critical digital services.” The current round of research is led by Ohio State University’s Integrated Systems Engineering department and includes New Relic, IBM, Salesforce, and Key Bank.
First we’ll unpack New Relic’s incident response process and how it evolved. Then we’ll explore New Relic’s collaboration with SNAFUcatchers, including process tracing, how expertise develops, working at “the sharp end,” the post-mortem spiral of death, and other lessons from working with world experts in joint cognitive systems and resilience engineering.
Doors will open at 5:30 for networking with light food and drink before the program starts promptly at 6pm. The program will conclude with a Q&A and should end by 7:30pm.
About our Speakers:
Beth Long is the project lead for New Relic’s collaboration with SNAFUcatchers. She’s been tinkering with the web since before CSS was a thing. She is currently on New Relic’s Solutions Strategy team, where she builds feedback loops between Sales, Marketing, Product Management, and Engineering, in particular around how New Relic fits into DevOps and reliability solutions. She reads, much of the night, and codes in the winter.
Tim Tischler is a reliability champion at New Relic focusing on the reliability and safety of the RPM web site. He's been focused on the partnership w/ SNAFUcatchers and has been practicing continuous delivery at scale since the era when everyone thought it was madness.

Catching SNAFUs & Learning from Incidents