[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink

![[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink](https://secure.meetupstatic.com/photos/event/9/0/b/e/highres_516517054.webp?w=750)
Details
- Venue: Together (Meeting Room) -- 700 E Middlefield Rd, Mountain View, Building 4, 1st Floor
- Zoom: https://linkedin.zoom.us/j/99142403394
5:30 - 6:00: Networking [in-person only + catered food]
6:00 - 6:05: Welcome
6:05 - 6:40: Live Dataframes: Let's Move Beyond the Limits of Stream Processing
Pete Goddard, Deephaven Data Labs
Modern stream processing remains constrained by its lack of composability—the ability to seamlessly build and evolve systems like snapping together LEGO bricks. As a result, stream processors are often limited to ETL-like roles, pipelining transformed data into lakes. True analytics, application logic, and AI workloads continue to rely on batch processing. Live Dataframes—incrementally updating, column-oriented, structured tables—offer a new paradigm to bring real-time analytics, application support, and AI fully into streaming architectures. In this talk, Pete will explore why Live Dataframes provide a more versatile, user-friendly, and fully composable abstraction, enabling next-generation real-time processing positioned to match the scale and impact of Apache Kafka.
- Pete is the founder and CEO of Deephaven Data Labs, a spinout from Walleye Capital, the hedge fund he also founded. With 25 years of experience running businesses powered by real-time data, Pete is passionate about advancing the state of real-time data processing. He and the Deephaven team believe that while Apache Kafka has revolutionized data transport, the next step is a similarly transformative leap in deriving insight and driving action from real-time data.
6:40 - 7:15: Northguard: Scalable Log Storage at Linkedin
Onur Karaman, LinkedIn
Log storage systems are a key building block for the infrastructure of some of the largest companies in the world. Existing solutions have struggled to keep up with the rapid growth in the number, volume, and complexity of pubsub use cases, and we believe this trend will continue as more use cases that depend on the pubsub pattern emerge. Scalability bottlenecks and operability pain points can start to show as usage grows to thousands of use cases and tens of trillions of records per day. Northguard is a log storage system developed at LinkedIn with a focus on scalability and operability. To achieve high scalability, Northguard shards its data and metadata, keeps minimal global state, and adopts a decentralized group membership protocol. Northguard's operability leans on log striping to distribute load across the cluster evenly by design.
- Onur is a Sr Staff Engineer at LinkedIn with an interest in distributed systems. He's the tech lead of Northguard, a log storage system with a focus on scalability and operability. Prior to Northguard, Onur was a committer to Apache Kafka, where he focused on Kafka's scalability. He redesigned the cluster's controller, made the controller use ZooKeeper's async APIs, and worked on the group coordinator and consumer group management protocol.
7:15 - 7:50: Virtualizing LinkedIn's Pub/Sub via Northguard-Xinfra
Wesley Wu and Ke Hu, LinkedIn
Northguard-Xinfra, LinkedIn's virtualization system for Pub/Sub, offers a unified Pub/Sub experience for customers. Northguard-Xinfra is compatible to work with multiple log storage systems, including Kafka. Northguard-Xinfra provides a transparent switch between Pub/Sub systems, native federation and easier client management. Northguard-Xinfra achieves these goals by supporting a set of Pub/Sub agnostic APIs and having a dedicated metadata layer to provide a virtualization over heterogeneous Pub/Sub systems.
- Wesley is a Sr Staff Engineer at LinkedIn. He joined LinkedIn Kafka team (now Streams IO team) in 2018. He worked on Kafka broker scaling and LinkedIn Kafka clients ecosystem previously. He currently focuses on the virtualized Pub/Sub System (a.k.a, Northguard-Xinfra).
- Ke is a Staff Engineer at LinkedIn. He joined LinkedIn Kafka team (now Streams IO team) in 2018. He worked on Kafka clients streamline at LinkedIn and Kafka ecosystem previously. He currently focuses on the virtualized metadata/control layer in Northguard-Xinfra.


[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink