Past Meetup

Stream Processing with Apache Kafka & Apache Samza

This Meetup is past

304 people went

Location image of event venue


Welcome to the February 2016 Stream Processing Meetup hosted by LinkedIn in Sunnyvale.

This meetup focusses on Apache Kafka, Apache Samza and related streaming technologies.

This will be Linkedin's second Streams Processing meetup at our new Corporate HQ in Sunnyvale. We have a beautiful facility on the top floor of the building full of comfortable couches and chairs.

6PM: Doors open

6-6:35PM: Networking & Welcome

6:35-7:10PM: SSD Benchmarks for Apache Kafka (Mingmin Chen, Uber)

At Uber, we operate 20+ Kafka clusters on commodity hardware with spinning disks. We used to run into disk IO saturation from time to time. In addition, some of our Kafka clusters are dedicated for business critical use cases with acks=all which requires very low latency SLA. In this talk we present our work on benchmarking SSD based Kafka clusters and its impact on end-end producer latency, partition scalability, failure recovery and so on. We will also discuss how this helps power our 0 data loss cluster used for financial pipelines.

7:15-7:50 PM: Asynchronous Processing and Multithreading in Apache Samza (Xinyu Liu, LinkedIn)

With the Apache Samza 0.11 release, Samze becomes the first stream processing framework to support both asynchronous processing and parallel processing models. This is unique among current open source stream processors because not only Samze can run traditional synchronous processing in parallel on multiple threads, but also it provides first-class support for asynchronous processing. Users can now perform non-blocking I/O directly for remote data access. This new model also introduces out-of-order processing to maximize parallelism with certain semantics guaranteed. In this talk we will discuss the Samza asynchronous API and model, explore the details of the asynchronous event loop and the semantics, and finally study the performance enhancements using benchmark jobs.

7:55-8:30 PM: Batching to Streaming Analytics at Optimizely (Vignesh Sukumar, Mike Davis, Hao Xia; Optimizely)

At Optimizely, we are building a cutting edge experimentation platform that ingests billions of click-stream events a day from millions of visitors for analysis. In this talk, we will highlight our transition to stream processing to provide real-time metrics on top of this event stream. We will also explain how Samza fits our needs and walk through a production level use case of Sessionization and aggregation.

Please RSVP *only* if you plan to attend in person. Our facility can host 200 guests.

Parking & Entrance:
You can park in the uncovered parking that is along 580 Mary Ave or in the parking garage located behind the building. There is also street parking available for overflow.

You need to enter 580 Mary from the rear of the building (opposite Maude Ave).

You will need to sign a standard NDA when you enter the lobby of 580 Mary.

Food & Drink:
Food & drink will be provided.

Can’t join us live?:
We will be live-streaming this event as well as posting recordings of the presentations.

Want to talk at a future meetup?:
Please contact us via the “Contact” button in