Apache Calcite Hybrid Meetup February 2025


Details
📍 Location: Cloudera- 5470 Great America Pkwy, Santa Clara, CA 95054, United States
Zoom link: https://cloudera.zoom.us/j/91779946468
Agenda
4:30PM - 5:00PM Sign up and Networking
5:00PM - 5:30PM Federated Query Planning w/ Calcite & Substrait - Victor Barua (Datadog)
Substrait is a cross-language serialization format and specification for communicating relational plans across systems. It is currently under active development, and systems such as DataFusion and DuckDB have started to support consuming and producing Substrait plans. Another system that has support for Substrait is Calcite, via the Isthmus library.
With Isthmus, it’s possible to parse SQL queries with Calcite, perform
planning and then delegate execution to external systems via Substrait
plans. It’s also possible to forgo SQL entirely, and submit Substrait plans
directly to Calcite for planning. This talk aims to provide an introduction
to Substrait, and showcase the capabilities of Isthmus in the context of
generating plans for execution across multiple data systems.
***
5:30PM - 6:00PM Streaming, Incremental, Finite-Memory Computations in SQL Over Unbounded Streams - Mihai Budiu (Feldera)
SQL is the standard language for expressing computations on collections. Using modern incremental view maintenance techniques, SQL can also be adopted as the standard language for computing on changes to collections. In previous presentations we have shown how to automatically convert any SQL program that defines views into an incremental program: the inputs of an incremental program are insertions, deletions, and updates to data tables, and the outputs of the incremental program are insertions, deletions, and updates of the maintained views.
Whereas SQL queries are stateless systems, the incremental programs
are stateful streaming systems that maintain complex *indexes* for
performing efficient updates. The indexes enable computing all updates in time proportional to the size of the changes.
In this presentation we discuss the problem of computing over data that grows unbounded (e.g., event streams), leading to potentially unbounded indexes. We present the design and implementation of a fully automatic mechanism which enables many such computations to use only finite memory by garbage-collecting the indexes at runtime. The mechanism requires users to specify bounds on the amount of "out-of-orderness" of the input data, using annotations on input tables.
***
6:00PM - 6:10PM Snacks Break
***
6:10PM - 6:40PM Revolutionizing Data Lakes: A Dive into Coral, the SQL Translation, Analysis, and Rewrite Engine - Walaa Eldin Moustafa (LinkedIn)
***
6:40PM - 7:10PM Optimizing Common Table Expressions in Apache Hive with Calcite - Stamatis Zampetakis (Cloudera)
In many real-world queries, certain expressions may appear multiple times, requiring repeated computations to construct the final result. These recurring computations, known as common table expressions (CTEs), can be explicitly defined in SQL queries using the WITH clause or implicitly derived through transformation rules. Identifying and leveraging CTEs is essential for reducing the cost of executing complex queries and is a critical component of modern data management systems.
Apache Hive, a SQL-based data management system, provides powerful mechanisms to detect and exploit CTEs through heuristic and cost-based optimization techniques.
This talk delves into the internals of Hive's planner, focusing on its integration with Apache Calcite for CTE optimization. We will begin with a high-level overview of Hive's planner architecture and its reliance on Calcite in various planning phases. The discussion will then shift to the CTE rewriting phase, highlighting key Calcite concepts and demonstrating how they are employed to optimize CTEs effectively.
***
7:10 PM - 8:00PM Open discussion & Networking
The event is also published via the Future of Data account:
https://www.meetup.com/futureofdata-siliconvalley/events/305841546/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link

Apache Calcite Hybrid Meetup February 2025