Stream Processing with Apache Kafka & Apache Samza



Welcome to the upcoming Stream Processing Meetup hosted by LinkedIn in Sunnyvale. This meetup focuses on Apache Kafka, Apache Samza, and related streaming technologies.

Location: Unify Conference Room, LinkedIn Corporate HQ in Sunnyvale. We will be on the 1st floor of 950 W Maude Ave, Sunnyvale, CA 94085


6 PM: Doors open

6-6:35 PM: Networking & Welcome

6:35-7:10 PM: Apache Pulsar - The next generation messaging system(Karthik Ramasamy, Co-Founder at Streamlio)

This talk introduces Apache Pulsar, a durable, distributed messaging system, underpinned by Apache BookKeeper a streaming storage system. It was originally developed at Yahoo, open sourced in November 2016 and incubating at Apache. Apache Pulsar introduces a segment centric architecture that provides durability, separation of storage and serving and low publish latency. It corporates several enterprise-grade features for multi-tenancy, geo-replication, support for different delivery semantics, and unified messaging model for queuing and streaming. In this talk, Karthik will discuss Apache Pulsar architecture and discuss how it decreases the complexity of development and operations.

7:15-7:50 PM: Conquering the Lambda architecture in LinkedIn metrics platform with Apache Calcite and Apache Samza(Khai Tran, Staff Software Engineer, LinkedIn)

Metrics play an important role in data-driven companies like LinkedIn, where we leverage them extensively for reporting, experimentation, and in-product applications. We built an offline platform to help people define and produce metrics driven through their transformation code, mostly in Pig or Hive, and metadata-rich configurations. Many of our users would like to look at these metrics in a real-time fashion. To support this, we recently built an extension to the platform that auto-generates Samza real-time flow from existing offline transformation code with just a single command. Combining with the existing offline platform, we delivered Lambda architecture without maintaining multiple code bases.

In this talk, we will describe how we use Apache Calcite to translate our offline logic, served as the single source of truth, into both Samza code and configuration for real-time execution.

7:55-8:30 PM: Building Venice with Apache Kafka & Samza (Gaojie Liu, Senior Software Engineer, LinkedIn)

Over the last two years at LinkedIn, we have been working on a distributed key-value store called Venice, which specializes in serving the datasets computed in Hadoop and Samza.
Venice "Hybrid Stores" can ingest data from both Hadoop and Samza and internally combine it, thus offering first-class support for lambda architectures.
In this talk, we will share how we built Venice by leveraging Kafka and how it empowers new Samza use cases at LinkedIn.

Please RSVP *only* if you plan to attend in person. Our facility can host 200 guests.

You can park in the uncovered parking that is along the building or in the parking garage located next to the building.


You will need to sign a standard NDA when you enter the lobby.

Food & Drink:

Food & drink will be provided.

Can’t join us in person?:

Live Stream is available here:

Want to talk at a future meetup?:

Please contact us via the “Contact” button in