Skip to content

[In Person] Kafka + Flink + Druid: Real-time stream processing and analytics

Photo of Lioz Nudel
Hosted By
Lioz N.
[In Person] Kafka + Flink + Druid: Real-time stream processing and analytics

Details

For those who might missed - #ApacheKafkaIL hosts a great meetup!

## Details

Joining forces of ApacheKafka and Druid communities, our next event will be held at AWS Tel Aviv offices, featuring three exciting topics.

# Agenda

17:00 - 18:00 Mingling, food, and drinks.
18:00 - 18:30 Migrating 2000 microservices to a multi-cluster managed Kafka with zero downtime / Natan Silnitsky@Wix.
18:30 - 19:00 Apache Flink 101 / Sofie Zilberman@AWS.
19:00 - 19:45 High-performance queries with Druid Clusters / Jonathan Hirsch@Appsflyer.
First session abstract: migrating to a multi-cluster managed Kafka:
As Wix Kafka usage grew to 2.5B messages per day, >20K topics and >100K leader partitions serving 2000 microservices,
we decided to migrate from self-operated single cluster per data-center to a managed cloud service (Like Amazon MSK or Confluent Cloud) with a multi-cluster setup. This talk is about how we gradually migrating all of our Kafka consumers and producers with 0 downtime while they continued to handle regular traffic. You will learn practical steps you can take to greatly reduce the risks and speed up the migration timeline.
Second session abstract: Apache Flink 101:
Apache Flink is a widely used data processing engine for scalable streaming ETL, analytics, and event-driven applications. It provides precise time and state management with fault tolerance. Flink can process bounded stream (batch) and unbounded stream (stream) with a unified API or application. In this session we will briefly overview Flink essentials and discuss the different deployments you can choose to run your Apache Flink Jobs on AWS (KDA, EMR , EKS)
Third session abstract: High-performance queries with Druid:
Apache Druid is an open-source, column-oriented, distributed data store designed to handle high-speed, high-volume data streams in real-time for fast queries and analysis. In this session, Jonathan will describe how Druid clusters effectively manage high-performance queries with significant concurrency usage. Jonathan will showcase how these clusters have addressed a significant challenge for Appsflyer by running queries spanning up to five years with exceptional performance. The talk will delve into the technical aspects of Druid cluster management, emphasizing the cluster's optimization to manage high-concurrency usage. Attendees will gain valuable insights on Druid clusters' ability to alleviate business & performance challenges associated with data analysis.

Photo of Druid IL group
Druid IL
See more events
AWS Experience at Floor28
Derech Menachem Begin 121 · Tel Aviv-Jaffa