Apache Calcite Online Meetup March 2023


Details
Tentative agenda
Part I: Presentations (2h)
Title: Adding measures to Calcite SQL
Presenter: Julian Hyde
Duration: 40 minutes
Abstract:
If SQL is the universal language of data, why do we author our most important data applications (metrics, analytics, business intelligence) in languages other than SQL? Multidimensional databases and languages such as MDX, DAX and Tableau LOD solve these problems but introduce others: they require specialized knowledge, complicate the data pipeline and don’t integrate well. Is it possible to define and query business intelligence models in SQL?
Apache Calcite has extended SQL to support metrics (which we call ‘measures’), filter context, and analytic expressions. With these concepts you can define data models (which we call Analytic Views) that contain metrics, use them in queries, and define new metrics in queries.
In this talk by the original developer of Apache Calcite, we describe the SQL syntax extensions for metrics, and how to use them for cross-dimensional calculations such as period-over-period, percent-of-total, non-additive and semi-additive measures. We describe how we got around fundamental limitations in SQL semantics, and approaches for optimizing queries that use metrics.
Title: Building a streaming incremental view maintenance engine with Calcite
Presenter: Mihai Budiu - VMware Research
Duration: 40 minutes
Abstract:
The DBSP open-source project
(https://github.com/vmware/database-stream-processor) is a Rust
runtime that unifies seamlessly streaming queries and incremental view
maintenance (IVM). The SQL to DBSP compiler is an open-source
(https://github.com/vmware/sql-to-dbsp-compiler) SQL compiler, based
on Calcite, that targets the DBSP runtime. The compiler has passed
more than 7 million SQL tests from SQLLogicTest (with 3 tests
failing). We will present the core design of the IVM engine and the
way we use Calcite to compile SQL to streaming computations. We
discuss a few syntactic and sematic differences between SQL dialects
that we have identified. Should Calcite support a notion of an "input
SQL dialect", that would allow it to emulate behaviors that differ
between other established SQL engines?
Title: Debugging planning issues using Calcite's built in loggers
Presenters: Alessandro Solimando, Stamatis Zampetakis
Duration: 30 minutes
Abstract:
Wrong results, high memory usage (OutOfMemoryError, GC pauses, etc), unresponsive server, infinite planning time, are some common issues that may arise when using Calcite and in general a query processor in production systems.
In this talk, we will demonstrate how we can exploit Calcite's built-in loggers (notably https://issues.apache.org/jira/browse/CALCITE-4704, and https://issues.apache.org/jira/browse/CALCITE-4991) to debug such issues through use-cases from Apache Hive.
Part II: Open discussion (30m)
Password for the Zoom event (if requested): Password: 429730

Apache Calcite Online Meetup March 2023