Skip to content

Details

Lakehouse at Scale
​Apache Iceberg adoption is accelerating, and with it come two operational realities data teams are running into head-on: table maintenance at scale and the demand for real-time, accurate retrieval powering AI systems.
​This meetup brings together practitioners and contributors working at both ends of that problem. Expect technical depth, real production context, and open discussion with engineers who are actively building and operating lakehouse infrastructure.
Agenda
11:00 - 11:30 | Registration and Welcome
11:30 - 12:10 | How Apache Doris Powers AI Agents with Hybrid Search and Real-Time Analytics Matt Yi, Apache Doris PMC Member, Tech VP at VeloDB

  • ​Why single-method retrieval (vector-only or keyword-only) breaks down in production AI systems
  • ​Hybrid search architecture: combining vector search, full-text search, and SQL for accurate, intent-aware retrieval
  • ​How Apache Doris's native real-time OLAP capability extends into real-time RAG pipelines
  • ​Cost and accuracy tradeoffs across retrieval strategies and what that means for context engineering at scale

12:10 - 12:50 | OLake Fusion: Solving Apache Iceberg Table Maintenance Problems at High Scale Ankit Sharma, Tech Lead + Badal Prasad Singh, Software Engineer, OLake

  • ​Why continuous CDC ingestion at scale creates small file accumulation and query performance degradation in Iceberg tables
  • ​Compaction strategies (lite, medium, full) and how to choose the right mode based on workload and file size targets
  • ​Cron-based scheduling, table enable/disable controls via Helm and Docker Compose
  • ​Multi-catalog support and lessons from building maintenance systems that do not interrupt live ingestion

12:50 - 1:00 | Break
1:00 - 1:30 | Apache Doris User Sharing Nilanjan Sarkar

  • ​Production experience taking Apache Doris from evaluation to live deployment
  • ​Practical challenges and decisions made along the way

1:30 | Lunch and Networking

Related topics

Events in Bengaluru, IN
Big Data
Data Engineering
Data Management
Data Lakes

You may also like