Sat, Feb 28 · 11:00 AM IST
Hello everyone! Join us for an IN PERSON Apache Kafka® meetup on Feb 28 from 11:00AM, hosted by Datazip in Bangalore!
📍 Venue:
Hustlehub Tech Park
PWD Quarters, 1st Sector, HSR Layout, Bengaluru, Karnataka 560102, India
https://maps.app.goo.gl/b38aFkzq7w5dCMqB8?g_st=ic
***
Agenda:
11:00 - 11:10: Welcome
11:10 - 11:50: SiShuo Yang, Senior Solutions Architect, VeloDB
11:50 - 12:30: Shubham Baldava, CTO at OLake
12:30 - 12:40: Break
12:40 - 13:20: Kumar Keshav, Engineering Manager, Confluent
13:20 - 14:20: Lunch
***
💡 Speaker:
SiShuo Yang, Senior Solutions Architect, VeloDB
Talk:
Apache Doris 4.0: Evolving from Real-Time Analytics to AI-Native Data Intelligence
Abstract:
Apache Doris has established itself as a leading foundation for real-time analytics, and one of the key elements to its success is its native integration with modern data streaming platforms like Apache Kafka. By leveraging features like Routine Load and exactly-once semantics, the combination of Doris and Kafka simplifies data architectures by eliminating complex ETL layers while maintaining high-throughput, sub-second query latency on fresh data streams.
The next phase of the Doris evolution addresses the growing demand for AI-integrated data systems. This session explores the architectural advancements in Doris 4.0 that enable a transition from standard OLAP toward AI-native data intelligence. By examining ByteDance's large-scale implementation journey, we will discuss how these new capabilities perform within demanding, real-world production environments.
-----
💡 Speaker:
Shubham Baldava, CTO at OLake
Talk:
Fast & Lightweight data ingestion from Kafka to Apache Iceberg
Abstract:
How we at OLake, have designed Kafka as a source, keeping concurrency at the center while keeping it lightweight. OLake - an open-source data ingestion & table-maintenance tool.
-----
💡 Speaker:
Kumar Keshav, Engineering Manager, Confluent
Talk:
Shift Left to get It Right
Abstract:
### The Problem: The Analytical-Operational Divide
Brittle Pipelines: Traditional ETL/ELT pipelines are expensive and break frequently when source models change.
Data Silos: High-quality data is often locked in downstream analytical silos, inaccessible to operational services.
"Bronze" Age Mess: Data scientists spend up to 80% of their time cleaning "bad" data in landing zones.
### The Solution: Shifting Left
Upstream Governance: Move data cleaning and validation as close to the source as possible.
Data Products: Publish trustworthy, reusable building blocks that serve both real-time and batch needs
Stream-Table Duality: Provide data as both Kafka topics and Iceberg tables to eliminate duplication.
### How Does Kafka Enable Shift Left
Kafka Connect: Uses CDC to bootstrap database rows into near real-time state events, enabling the decoupled outbox pattern.
Schema Registry: Enforces Data Contracts, rejecting malformed data at the producer level through structural and semantic rules.
Kafka Streams: Allows services to maintain local state, ensuring access to the "eventually correct" record.