Skip to content

Details

Apache Kafka is an amazing streaming platform, and along with streaming libraries like Kafka Streams, and tools like Apache Flink, it can meet so many of our real-time data processing needs. But it’s always been a challenge to query that streaming data. The most common pattern to meet this need is to use Kafka Connect to write data out to a database with which you can then query to your heart’s content. This works but it adds unnecessary latency.

Apache Druid is a real-time database designed with Kafka in mind. Druid thinks about data the same way that Kafka does. With direct Kafka integration, Druid allows us to query real-time data, in, well, real-time. Even before the data is fully loaded into Druid, it is available to respond to queries. When I first learned how Druid does this I was blown away.

In this session, we’ll get an overview of Apache Kafka and Apache Druid, and then we’ll focus in on the way Druid ingests, and queries events from Kafka with such amazing speed. We’ll also see how Druid can combine new incoming event data from Kafka and older stored data in the same query.

And, since you’ll probably be as impressed with Druid as I was, I’ll leave you with some resources to continue your learning journey.

Note: This is a hybrid online/in-person event. Food and social begins at 6pm for those in the OCI training room. Meeting starts at 6:30pm for those online and in the training room.

AI summary

By Meetup

Hybrid session for data engineers on how Apache Druid ingests and queries Kafka events in real time so you can run real-time queries over Kafka data.

Members are also interested in