Breaking Things on Purpose #ChaosEng


Details
Failure Testing prepares us, both socially and technically, for how our systems will behave in the face of failure. By proactively testing, we can find and fix problems before they become crises. Practice makes perfect, yet a real calamity is not a good time for training. Knowing how our systems fail is paramount to building a resilient service.
At Netflix and Amazon, we ran failure exercises on a regular basis to ensure we were prepared. These experiments helped us find problems and saved us from future incidents. Come and learn how to run an effective “Game Day” and safely test in production. Then sleep peacefully knowing you are ready!
Kolton Andrus (https://twitter.com/KoltonAndrus) is the Founder of Gremlin Inc. (http://www.gremlininc.com/), which provides ‘Failure as a Service’ and helps companies build more resilient systems. Recently he was an Engineer at Netflix improving streaming reliability and operating the Edge services. His focus at Amazon was the Retail Website’s resilience and performance. In both companies he has served as a ‘Call Leader’, managing the resolution of company wide incidents. Kolton is passionate about building resilient systems, as it lets him break things for fun and profit.

Breaking Things on Purpose #ChaosEng