[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink

![[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink](https://secure.meetupstatic.com/photos/event/9/0/b/e/highres_516517054.webp?w=750)
Details
- Venue: Diversify (Meeting Room) -- 800 E Middlefield Rd, Mountain View, CA 94043
- Zoom: https://linkedin.zoom.us/j/94765243439
5:30 - 6:00: Networking [in-person only + catered food]
6:00 - 6:05: Welcome
6:05 - 6:40: Flink on Darwin: An interactive SQL Editor to improve Flink onboarding experience
Liang Wu, LinkedIn
Darwin, LinkedIn's hosted notebook platform, offers interactive experiences for offline applications like Trino and Spark. To bring the power of real-time streaming to our users, we've integrated Darwin with Flink SQL. Previously, users with limited Flink experience often faced a steep learning curve during onboarding. Flink on Darwin addresses these challenges by simplifying the process, providing intuitive data visualizations, and shortening the feedback loop. In this talk, we will delve into the implementation of Flink on Darwin and demonstrate how it streamlines user onboarding, making the journey of crafting streaming applications more accessible and efficient.
- Liang is a Staff Software Engineer at LinkedIn, specializing in stream processing. He began his journey at Bloomberg after graduating from Columbia University, where he developed a strong foundation in stream application development. In 2022, he joined LinkedIn to focus on Flink SQL and improving the stream processing user experience*.*
6:40 - 7:15: Do Virtual Threads improve Kafka Consumer throughput?
Eric Sun, Github
Learn whether migrating your Kafka consumer threads to virtual threads can improve the throughput of I/O heavy consumer workloads. Take a deep dive on the workings of the Consumer client networking internals and JDK networking support for virtual threads. Finally, learn how to identify how different types of thread pinning may cause problems in arbitrary third party libraries..
- Eric brings 10+ years of experience on platform teams to the Data Pipelines team at GitHub, which oversees the messaging platforms that connect GitHub’s applications and data stores. He enjoys working on performance problems and specializes in incident analysis and remediation*.*
7:15 - 7:50: RisingWave: Everything You Wanted to Know but Were Afraid to Ask About Stream Processing
Yingjun Wu, RisingWave
Stream processing systems seem magical: they deliver much fresher results compared to batch processing, promise the highest levels of consistency, and leverage S3 to reduce state storage costs. But is it too good to be true? In the world of data systems, there’s no such thing as a free lunch. Every benefit comes with trade-offs. In this talk, I will discuss the lesser-known aspects of RisingWave and stream processing, including but not limited to core issues such as cost, consistency, backfilling, and resource isolation. I’ll share real-world examples of how these issues manifest and the “bloody facts” of how they can bite even the most experienced practitioners.
- Yingjun is the founder of RisingWave Labs (https://www.risingwave.com/), a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. He has been working in the field of stream processing and database systems for over a decade.

[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink