Improving the Resiliency of Your Distributed System with Chaos Mesh


Details
Join this meetup to hear about how Prudential utilizes Chaos Mesh to perform anomaly detection to understand the root cause for Prudential’s AIMLOps system, and how PingCAP uses Chaos mesh to improve the resiliency of TiDB.
Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos in Kubernetes environments. With Chaos Mesh, you can test your system's resilience and robustness on Kubernetes by injecting various types of faults into Pods, network, file system, and even the kernel.
Agenda
Talk 1: Testing Cloud Native Services with Chaos Mesh
-> Speaker: Liquan Pei, Senior Database Engineer, PingCAP
-> Abstract: In distributed systems, failures happen unpredictably, especially in the cloud. Chaos engineering is adopted to make distributed systems resilient. In this talk, Liquan will introduce Chaos Engineering, and how engineers adopted Chaos Engineering to test TiDB, a distributed database. This eventually turned into the birth of Chaos Mesh, an open-source chaos engineering platform for Kubernetes. It adopts a cloud-native design and currently supports 10+ chaos types. It has also been adopted in other companies to test their distributed systems.
Talk 2: How Chaos Mesh facilitates and simplifies AIOps platform adoption
-> Speaker: Alex Khaerov, Sr. Manager Cloud and Site Reliability Engineering, Prudential
-> Abstract: As a new evolutionary round of monitoring and observability kitchen, Prudential started AIOps journey. In search of anomaly detection and root cause analysis (RCA) solutions, there is a demand for functional and controllable outages (experiments). Indeed, it is a must to make the evaluation and calibration of new ML-powered systems and especially to gain initial people's trust in new toys. In this talk, Alex will share how Chaos Mesh took an essential place in such operations and overall AIOps endorsement.

Improving the Resiliency of Your Distributed System with Chaos Mesh