Skip to content

Lightning fast monitoring against lightning fast outages

Photo of Fred Moyer
Hosted By
Fred M.
Lightning fast monitoring against lightning fast outages

Details

Maxime Petazzoni from SignalFx will talk about his experiences with how distributed applications fail.

"Among the many ways in which modern, distributed applications can fail, there is one category of outages that is always difficult to handle: the unpredictable, lightning fast implosion of a service tier under sudden, unpredictable load. Those happen so fast that most often, both humans and machines don't catch the problem fast enough to prevent complete meltdown.

From our own production incidents, we've learned that the ability to detect complex anomalies in real time, as early as possible, and the ability to act on that information automatically and dynamically are key to preventing those anomalies into turning into outages. But achieving this is not without its challenges: dealing with streaming data is hard, and the monitoring system is only one part of the equation: services need to have built-in, dynamically and programmatically actionable safety valves, etc.

In this talk, I'll share some of our learnings and our approach to building a monitoring system that makes this possible, and capable of supporting today and tomorrow's needs."

Photo of #MonitorSF group
#MonitorSF
See more events
Craigslist
222 Sutter Street · San Francisco, CA