Skip to content

Stream Processing with Apache Kafka & Apache Samza

Photo of Hristo Danchev
Hosted By
Hristo D. and Samarth S.
Stream Processing with Apache Kafka & Apache Samza

Details

Welcome:

Welcome to the upcoming Stream Processing Meetup hosted by LinkedIn in Sunnyvale. This meetup focuses on Apache Kafka, Apache Samza, and related streaming technologies.

Location: Unify Conference Room, LinkedIn Corporate HQ in Sunnyvale. We will be on the 1st floor of 950 W Maude Ave, Sunnyvale, CA 94085

Agenda:
5:30 PM: Doors open

5:30-6:00 PM: Networking

6:00 -6:30 PM: Azure Stream Analytics
Sasha Alperovich & Sid Ramadoss, Microsoft

Azure Stream Analytics (ASA) is a fully managed near real-time data processing service on Azure. In this talk we will highlight the unique value propositions that ASA brings to the table, and show a demo of how Azure customers can utilize the power of ASA to gain insights in near real-time with the NYC taxi scenario. We will then dive deeper into how the service is built, covering resiliency, dataflow and other technical aspects of the ASA runtime. We will also discuss how ASA’s unique design choices compare and contrast with other streaming technologies, namely Spark Structured Streaming and Flink

6:30 - 7:00PM: Stream Processing in Python with Samza and Beam
Hai Lu, LinkedIn

Apache Samza is the streaming engine being used at LinkedIn that processes around 2 trillion messages daily. A while back we announced Samza's integration with Apache Beam, a great success which leads to our Samza Beam API. Now an UPGRADE of our APIs - we're now supporting Stream Processing in Python! This work has made stream processing more accessible and enabled many interesting use cases, particularly in the area of machine learning. The Python API is based on our work of Samza runner for Apache Beam. In this talk, we will quickly review our work on Samza runner, and then how we extended it to support portability in Beam (Python specifically). In addition to technical and architectural details, we will also talk about how we bridged Python and Java ecosystems at LinkedIn with the Python API, together with different use cases.

7:00-7:30 PM: Apache Kafka at LinkedIn: How LinkedIn customizes Kafka to work at the trillion scale
Jon Lee & Wesley Wu, LinkedIn

At LinkedIn, we operate thousands of brokers to handle trillions of messages per day. Running at such a large scale constantly raises various scalability and operability challenges for the Kafka ecosystem. While we try to maintain our internal releases as close as possible to upstream, we maintain a version of Kafka which includes patches for addressing our production and feature requirements. In this presentation we will share the Kafka that LinkedIn runs in production, the workflow process we follow to develop new patches, the way we upstream the changes we make, some of the patches we maintain in our branch and how we generate releases.

7:30-8:30 PM: Additional networking and Q&A

RSVP:
Please RSVP only if you plan to attend in person. Our facility can host 250 guests.

Parking:
You can park in the uncovered parking that is along the building or in the parking garage located next to the building.

NDA
You will need to sign a standard NDA when you enter the lobby.

Food & Drink:
Food & drink will be provided.

Can’t join us in person?:
Join us online - https://www.bluejeans.com/9199243124

Want to talk at a future meetup?
Please contact us via the “Contact” button in meetup.com.

Photo of Stream Processing with Apache Kafka, Samza, and Flink group
Stream Processing with Apache Kafka, Samza, and Flink
See more events