How making our Kafka Cluster resilient to zonal outages sped up deploys by 2x


Details
We are the Data on Kubernetes Community (DoKC), where end users go to share best practices for running data workloads on Kubernetes.
We're excited to host our monthly DoKC Town Hall virtual event in February 2024! This is an event to bring the community together to meet each other, share end-user journey stories, DoK-related projects and technologies, and keep you up-to-date on community events and ways to participate. Meetings will be held on the third Thursday of each month at 10am PT.
---
AGENDA
[10:00 AM]
Welcome and Community Updates
[10:10 AM]
How making our Kafka Cluster resilient to zonal outages sped up deploys by 2x
Presented by: Kamya Shethia, Senior Software Engineer @ Etsy
Session Description
Kafka is an important part of Etsy's data ecosystem, moving data that powers a number of things like analytics, A/B testing, and search indexing. Etsy runs its Kafka cluster on Kubernetes, and in 2022, we made an effort to ensure that the cluster could withstand a GCP zonal outage. This created an interesting opportunity for the team to change the way they approached rolling out changes to the Kafka brokers. They were able to cut the time for these updates from 7 hours, to a little over 2.
In this talk, Kamya will discuss the changes made to add zonal resiliency to the Kafka cluster, that enabled them to speed up their update process.
[10:55 AM]
DoK Quiz

How making our Kafka Cluster resilient to zonal outages sped up deploys by 2x