Skip to content

Real-World Lessons in Troubleshooting and Tuning Kafka at Large Scale

Photo of Ofir Sharony
Hosted By
Ofir S.
Real-World Lessons in Troubleshooting and Tuning Kafka at Large Scale

Details

Join us on Monday, September 30th, to explore lessons learned at DoubleVerify from designing and running large-scale (>1 million messages per second) and complex stream processing applications in production. Hear from DoubleVerify's Senior Engineers as they discuss the challenges they faced, how they identified them, and what you should consider when designing, coding, and maintaining similar Kafka-based applications.

Agenda:
17:30 – Welcome: Networking, Food & Drinks

18:00 – Debugging Kafka Topic Join Issues: A Case Study in Microservices Architecture

Daniel Sinai, Senior Software Engineer
This talk explores a real-world challenge encountered while implementing a manual request notification system using Kafka Streams in a microservices architecture. We'll discuss the initial setup, the unexpected join failures, and the detective work that led to discovering a subtle partitioning inconsistency between different Kafka client libraries. The session will provide valuable insights into Kafka's partitioning strategies, the importance of co-partitioning in Kafka Streams joins, and best practices for troubleshooting complex distributed systems.

18:50 – Optimizing Kafka Producer for High Throughput and Low Latency: Our Journey of Configuration Tuning

Ofir Olivenbaum, Senior Software Engineer
In this talk, we will share our experience and journey in optimizing Kafka producer configuration to meet our demanding requirements for high throughput and low latency. Initially, we faced challenges with the default Kafka configuration and JVM and GC performance, which were not sufficient for our needs. We embarked on a journey of iterative configuration and performance tuning, which significantly improved our service performance. We will discuss the steps we took to improve the Kafka producer configuration, the impact of these changes on our service performance, and the lessons we learned along the way. This talk will provide valuable insights for those looking to optimize Kafka for high performance and reliability.

Address: Alon Tower#2, 94 Yigal Alon St., Floor 27, Tel Aviv
See you there!

Photo of #ApacheKafkaIL group
#ApacheKafkaIL
See more events