Skip to content

Data Streaming & Lakehouse Night: Databricks, StreamNative, and RisingWave

Data Streaming & Lakehouse Night: Databricks, StreamNative, and RisingWave

Details

Event Description
Are you tired of the hype around GenAI? Ready to dive into the latest trends in data infrastructure? Join us for an in-person event to connect with data infrastructure experts and gain insights into data streaming and lakehouse technologies from industry leaders at Databricks, StreamNative, and RisingWave.

Event Speakers
Jason Reid, Co-founder of Tabular (acquired by Databricks)
Sijie Guo, Co-founder of StreamNative
Yingjun Wu, Founder of RisingWave

Talks/Abstracts
Sijie Guo - Ursa: Kafka-compatible data streaming on Lakehouse
Abstract: Ursa is a Kafka-compatible data streaming engine built on top of a lakehouse, enabling users to store their topics and associated schemas directly in lakehouse tables. Ursa utilizes the innovations that StreamNative has developed to evolve Pulsar's storage layer from a disk-based shared storage layer to an object storage-based tiered storage system and to integrate with the lakehouse ecosystem. The Ursa engine simplifies the integration between data streams and lakehouse tables, drastically reducing the complexity of using bespoke integrations. In this talk, we will dive deeper into the details of the Ursa engine and how it leverages the lakehouse as a storage backend.

Yingjun Wu - The Streaming Lakehouse Era: Is Kafka the New Data Lake?
Abstract: Apache Kafka plays a pivotal role in the technology stack of numerous data-driven corporations. Widely perceived as a “repository for recent data,” many organizations use Kafka to hold recently ingested data for durations ranging from 7 days to a month before transferring it to data lakes. However, there is increasing evidence suggesting that data persists in Kafka for longer periods, indicating that Kafka itself is evolving into a new form of data lake. In this talk, I will discuss whether Kafka can be considered the new data lake and how we can build a streaming lakehouse using open-source technologies like Kafka, RisingWave, and Iceberg.

Jason Reid - The (Open) Interface is Everything
Abstract: SQL may be the universal language of data, but the emergence of a number of prominent open source standards over the past 15 years has helped revolutionize the way that our society interacts with data. Apache Arrow, Apache Iceberg, Apache Kafka, Apache Parquet, and Apache Spark are just some of the projects that have fueled this transition. In this talk, we explore the power of open, standard interfaces by recounting the steps we have taken to this point in an effort to cast light into what lies ahead on this journey.

Bios
Sijie Guo is one of the original creators of Apache Pulsar and the Co-founder and CEO of StreamNative. His journey with data streaming began at Yahoo! and he also led a messaging infra team in Twitter. In 2017 he co-founded Streamlio which was acquired by Splunk and in 2019 he founded StreamNative.

Yingjun Wu is the founder of RisingWave Labs, a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from the National University of Singapore and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.

Jason Reid is one of the co-founders and head of product at Tabular (acquired by Databricks). Previously, he was a Director of Data Engineering at Netflix where he worked from 2013 - 2021. He has been working with cloud based data infrastructure for over 15 years.

Photo of Real-Time Data + AI San Francisco group
Real-Time Data + AI San Francisco
See more events
Plug and Play Tech Center
440 N Wolfe Rd · Sunnyvale, CA