Building Software Resilience is Chaos Engineering


Details
In this talk, Matt will discuss how chaos engineering can build tech resilience in software and how developers, QA engineers, and SREs can work together to improve software reliability.
Businesses are increasingly adopting cloud-native deployments as a means to increase developer velocity. The CNCF 2021 annual survey stated, “Kubernetes has crossed the adoption chasm to become a mainstream global technology.” According to CNCF’s respondents, 96% of organizations are either using or evaluating Kubernetes. This rapid adoption of Kubernetes has created significant complexity and revealed the inadequacy of traditional systems testing.
Chaos engineering has emerged as a new testing discipline and means to transform the reliability of cloud-native services. According to Gartner, “40% of organizations will implement chaos engineering practices as part of DevOps initiatives by 2023, reducing unplanned downtime by 20%.” Many organizations considered early adopters of chaos engineering are manually running experiments on a few applications in a pre-production environment. Very few organizations are automating this practice throughout the software delivery lifecycle (SDLC) due to complexity and lack of industry maturity.
Matt has been in the technology industry for 20+ years, consulting customers on technology reliability and resiliency. He’s worked on I.T. disaster recovery tests on VAX computers running a nuclear power plant, large enterprise mainframes running fortune 50 companies, and 1000s of microservices running on Kubernetes. Most recently, he has been growing the chaos engineering community as an engineer at Target, product manager at Gremlin, and product marketer at Harness, helping to ensure developers can focus on solving problems instead of firefighting I.T. outages in the middle of the night.

Building Software Resilience is Chaos Engineering