[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink

![[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink](https://secure.meetupstatic.com/photos/event/9/0/b/e/highres_516517054.webp?w=750)
Details
- Venue: 700 E Middlefield Rd, Mountain View, Building 4, 1st Floor, Together
- Zoom: https://linkedin.zoom.us/j/93953206183
5:30 - 6:00: Networking [in-person only + catered food]
6:00 - 6:05: Welcome
6:05 - 6:40: Unified Streaming Framework - A Declarative Real-Time Streaming Framework for Flink & Spark
Basar Onat & Chen Yang, DoorDash
In DoorDash, we have various streaming event processing frameworks that are used. For the end users, this split slows down their velocity and they have to learn about nuances of each framework. This caused us to unify the frameworks under one architecture with a declarative framework that can be applicable to both frameworks and pick the right one based on the user requirements and SLAs.
- Basar Onat is an accomplished software engineer specializing in realtime streaming platforms. At DoorDash, he works as a Software Engineer on the Real-Time Platform team, focusing on the Realtime Metrics Platform. Basar also contributed significantly as a Software Engineer at Meta, handling large-scale data workloads with the Presto team. Prior to that, he played a vital role in bootstrapping the Realtime Platform at Striim.
- Chen Yang is a software engineer in the Real-Time Streaming Platform team at DoorDash. His focus is on building the streaming processing platform and real-time related products to power DoorDash applications.
6:40 - 7:15: Streaming Queries without Compromise
Mihai Budiu & Leonid Ryzhyk, Feldera
Modern databases excel at the task of data analysis when data changes infrequently. However, for rapidly changing data many custom systems have been built under the guise of streaming systems. To offer near-real time answers, streaming systems compromise on the expressivity of computations they can perform. We argue that this compromise is unnecessary. Based on a new theoretical foundation we built Feldera, a streaming query engine which can execute any traditional database computation in streaming mode. The core of our system is an algorithm that converts an arbitrary query on data tables into a query that computes on change streams. We describe the core ideas behind this technology and give a demo of the system.
- Mihai Budiu is chief scientist at Feldera. He has a Ph.D. in computer science from Carnegie Mellon University. He was previously employed at VMware Research, Barefoot Networks, and Microsoft Research. Mihai has worked on reconfigurable hardware, computer architecture, compilers, security, distributed systems, big data platforms, large-scale machine learning, programmable networks and P4, data visualization, and databases; four of his papers have received “test of time” awards. He has also received two technology transfer awards.
- Leonid Ryzhyk (CTO / Co-Founder) holds a PhD. in computer science from the University of New South Wales. Before co-founding Feldera he worked as a researcher at NICTA, University of Toronto, Carnegie Mellon University, Samsung Research America, and most recently at VMware Research. Leonid has published dozens of research papers on operating systems, programming languages, formal verification, software-defined networks, and databases.
7:15 - 7:50: Ibis: The Portable Python Dataframe API
Phillip Cloud & Chloe He, Voltron Data
Ibis is a lightweight Python library that helps you rapidly develop analytics, from development to production, using a dataframe API. You can pull a sample of a dataset from the production system, work with it locally and then swap out connection information to run that same code in production. Ibis is also great as part of a larger pipelining system: think Kedro, dbt and sqlmesh. Recently, Ibis gained awesome streaming backends that we're excited to present. We'll give a technical overview, show it in action, and time permitting we can talk about what the future looks like for Ibis.
- Phillip Cloud is a principal software engineer at Voltron Data. He works primarily on Ibis, a portable dataframe API for Python. He has done different things over the years, from jazz drumming to cognitive neuroscience, and naturally he wound up working on software libraries! You can find him on GitHub and the Ibis Zulip instance.
- Chloe He has a background in data science and started working on streaming systems as a Founding Engineer at Claypot AI, a startup tackling challenges in real-time machine learning. She led the infrastructure development of an open-source real-time feature engineering platform and worked on the translating and optimizing streaming workloads that served low-latency use cases. She brought her streaming expertise to Voltron Data, where she leads the development of Ibis streaming.

[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink