Skip to content

Details

Welcome to the Stream Processing Meetup hosted by LinkedIn! This meetup focuses on Apache Kafka, Samza, Flink, Beam, and related streaming technologies.
Location: https://linkedin.zoom.us/j/92661619808?pwd=RVdya003QjZJeHgvU04vQksxdTBOUT09

6:00 - 6:05 PM: Welcome & Introductions
6:05 - 6:40 PM: Autoscaling Flink at Netflix
Timothy Farkas, Netflix
Abstract: Keystone Data Pipeline manages several thousand Flink pipelines, with variable workloads. These pipelines are simple routers which consume from Kafka and write to one of three sinks. In order to alleviate our operational overhead, we've implemented autoscaling for our routers. Autoscaling has reduced our resource usage by 25% - 45% (varying by region and time), and has reduced our on-call burden. This talk will take an in depth look at the mathematics, algorithms, and infrastructure details for implementing autoscaling of simple pipelines at scale. It will also discuss future work for autoscaling complex pipelines.

  • Bio: Timothy Farkas is an Apache Apex and Apache Drill committer. He has been working on big data and stream processing at large companies and startups for the past 10 years.

6:40 - 7:15 PM: XStream: Stream Processing Platform at Facebook
Aniket Mokashi, Meta
Abstract: Realtime data processing powers a variety of use cases at Facebook. To power these, we have built a Stream Processing ecosystem over the years with Scribe, Puma, Stylus, Laser, Swift etc. With XStream, we are consolidating this ecosystem under one umbrella augmenting capabilities of these systems with advanced computation libraries such as Velox, built at Facebook. In this talk, we would cover evolution of stream processing systems at Facebook. We will also talk about XStream, a novel platform we've built at Facebook, we will discuss its features such as joins, pluggable shuffle, interpretive engine, vectorized execution and how it consolidates the previous realtime data processing ecosystem

  • Bio: Aniket Mokashi is an Engineering Manager on Stream Processing team at Facebook. Throughout his career, he has contributed to the development of large scale data processing frameworks and platforms. Prior to Facebook, he has worked on data platform teams at Youtube, Twitter and Netflix. He is also a committer and PMC member on Apache Parquet and Apache Pig projects. Aniket holds a Master's degree in Information Networking from Carnegie Mellon University.

7:15 - 7:50 PM: Power Machine Learning Feature Engineering with Managed Beam Platform at LinkedIn
Yanan Hao, LinkedIn and David Shao, LinkedIn
Abstract: At LinkedIn, machine learning models are applied in almost all the key products like Job Recommendation, Search, Feed, and Ads etc. ML models are powered by thousands of features about entities like companies, job postings, and LinkedIn members. Preparing and managing features has been one of the most time-consuming parts of operating our ML applications at scale. And there is growing demand for fresh "real-time" feature data, which are expected to have significant business impact by boosting ML models' relevance performance.
Apache Beam has grown to be a key flavor to build stream processing applications at LinkedIn. The LinkedIn Stream Processing infrastructure team has embarked on a new journey to offer a fully managed platform based on Beam, which targets to minimize end users' operation overhead by automating platform upgrades, health monitoring, and remediation. Fedex-Realtime, the Nearline Feature Generation framework, is to run on the Managed Beam platform to scale the development and operation of real-time features at LinkedIn.

  • Bio: Yanan Hao is a Staff engineer from the LinkedIn Stream Processing Infrastructure team. She has been working on the Workflow API for authoring Managed Beam applications. She also has three years of experience in LinkedIn Machine Learning infrastructure team building model productionalization platform.
  • Bio: David Shao is an engineer from the LinkedIn Machine Learning infrastructure team. He has been working on building the Fedex Realtime framework in collaboration with the stream processing infrastructure team.

We are hiring!
Staff SWE

SRE:

Related topics

Machine Learning
Apache Kafka
Stream Processing
Apache Beam
Apache Flink

You may also like