Welcome to the Tuesday, August 23rd Stream Processing Meetup hosted at LinkedIn in Mountain View.
This meetup focusses on Apache Kafka, Apache Samza and related streaming technologies.
6PM: Doors open
6:30-7:05 PM: Consumer Group Internals: Rebalancing, Rebalancing, Rebalancing, Rebalancing, Jason Gustafson & Onur Karaman
Getting data out of Kafka means working with consumer groups. In 0.9, the Kafka team introduced a new coordination protocol built on top of Kafka itself and a new consumer client which leverages it. But how does it work and how does it scale? In this talk, you will find out from two of its main developers.
Bio: Jason Gustafson is a software engineer at Confluent Inc. who has spent the last year working on Kafka internals and the Confluent Stream Data Platform. Onur Karaman is a software engineer on the Kafka team at LinkedIn. Before LinkedIn, Onur studied computer science at UIUC.
7:05-7:40PM: Nearline Topic Tagging of News Articles on Samza, Eric Huang
At LinkedIn, to provide meaningful and fresh content to our users at scale, we automatically tag news articles with the topics that they are about. We do this at the global scale for each article entering the LinkedIn ecosystem within minutes, using topic models for concepts from "3M" to "Zoology" that exceed the size of the typical Samza container. In this talk, I will present a distributed architecture for our nearline topic tagger built on Samza, offline-to-online model delivery, the overarching machine learning workflow, and interesting problems and solutions we have encountered along the way.
Bio: Eric Huang is an analytics engineer at LinkedIn, helping to build and scale LinkedIn's big data analytics and personalization platforms, enabling their products to support hundreds of millions of users worldwide. Prior to this Eric was a scientist at Palo Alto Research Center (PARC) researching graph algorithms, automated planning, and automated data integration. Eric received his Ph.D. in Computer Science from UCLA
7:40-8:20 PM: How to convert a legacy Hadoop Map/Reduce ETL systems to Samza Streaming, Louis Calisi
In this presentation Louis Calisi will present how Tripadvisor converted our legacy Hadoop Map/Reduce jobs to Samza Streaming. This system feeds thousands of tables and downstream reports. No data loss and full backwards capability were required.
Bio: Louis is a Principle Software Engineer working at Tripadvisor. I help lead the architecture and development of the core ETL and reporting systems.
RSVP: Please RSVP *only* if you plan to attend in person. Our facility can host 200 guests.
Parking: Anywhere that you see an open spot!
NDA: You will need to sign a standard NDA when you enter the lobby of 2025.
Food & Drink: Food & drink will be provided.
Can’t join us in Mountain View?: We will be live-streaming this event as well as posting recordings of the presentations. We will post the live-stream URL in this group within 1-hour of the event.
Want to talk at a future meetup? Please contact us via the “Contact” button in meetup.com.