Stream Processing with Apache Kafka, Samza, and Flink
Details
Welcome to the Stream Processing Meetup hosted by LinkedIn! This meetup focuses on Apache Kafka, Samza, Flink, Beam, and related streaming technologies.
Location: https://linkedin.zoom.us/j/92296342207?pwd=b05RTzNaUGVuUnlLcHVKeTIzQmpDdz09
6:00 - 6:05 PM: Welcome & Introductions
6:05 - 6:40 PM: Cruise Control at AWS MSK Serverless
Mohit Paliwal, Amazon Web Services (AWS)
Abstract: Serverless is a newly offered cluster type for AWS Managed Streaming for Apache Kafka. It makes it easy for customers to run their Kafka workloads without having to manage or scale cluster capacity. MSK Serverless has servers abstracted from users. Hence the responsibility is on AWS MSK Serverless team for patching, managing, balancing of the clusters and timely mitigation of imbalance. In order to balance the clusters, AWS MSK Serverless is leveraging LinkedIn Cruise Control behind the scenes. AWS MSK Serverless and LinkedIn Cruise Control collaborated to add some AWS specific customizations to LinkedIn Cruise Control. In this talk I will discuss the AWS MSK Serverless use case, customizations that AWS MSK made to Cruise Control and Open Source Collaboration between LinkedIn and AWS. I will talk about how Cruise Control users with specific need can leverage our change for their own Kafka clusters. I will also talk about the future work.
- Bio: Mohit Paliwal is a Software Engineer at AWS Managed Streaming for Apache Kafka. He has built AWS MSK Serverless and AWS Glue Schema Registry from the ground up. He has built real time streaming platforms for intelligent routing for Amazon CloudFront.
6:40 - 7:15 PM: Apache Beam - A fully language portable and scalable batch and streaming data processing
Chamikara Jayalath, Google
Abstract: Apache Beam introduces a unified programming model for batch and streaming data processing. Beam provides SDKs for various programming languages, for example, Java, Python and Go and Beam pipelines are executed in a runner, for example, Apache Flink, Apache Spark and Google Cloud Dataflow. Beam provides a portability framework that allows a given runner to execute transforms defined in any given SDK. This lets Beam runners maintain one implementation that supports all current and future SDKs. Additionally, this allows Beam runners to use transforms from different SDK languages in the same pipeline. In this talk, we'll look into Apache Beam’s portability framework and its benefits.
- Bio: Chamikara Jayalath is a Senior Software Engineer at Google and a PMC member for the Apache Beam project. Chamikara has been contributing to Apache Beam from its inception and has primarily contributed to various Beam I/O connector frameworks and Apache Beam’s Multi-language pipelines framework. Prior to Google, Chamikara completed his PhD at Purdue University focussing on large scale distributed data processing
7:15 - 7:50 PM: Effective detecting and preventing abuse on LinkedIn with Beam streaming processing
Rui Han, Allen Xin, Riken Shah, Zhisheng Zhou LinkedIn
Abstract: Data is essential to our anti-abuse defense mechanisms such as machine learning models or heuristic based anti-abuse systems. Prior to adapting Samza Beam, many important anti-abuse detective mechanisms ran offline. Due to the adversarial nature of the problem, the defenses generated from the offline datasets need to be updated frequently in order to detect and prevent abuse activities that are rapidly changing and evading, and in some cases, it’s not sufficient to handle the attacks. The paradigm shift from offline to nearline streaming data processing opens a new space and empowers us to defeat sophisticated abusers quickly and accurately
- Bio: Allen Xin - Sr Software Engineer, who has been working at Anti-Abuse Infrastructure team, Trust Engineering since Feb.2020. He's been focusing on the anti scraping & automation area since he joined LinkedIn.
- Bio: Rui Han - Rui Han is a Senior Software Engineer at the LinkedIn anti-abuse team. He has been working on the anti-abuse domain since 2019. His work focused on research and development of anti-abuse technologies in anti-automation and anti-spamming.
- Bio: Riken Shah- Software Engineer at the Linkedin anti-abuse team. He has worked on the nealine infrastructure platform based on Samza beam.
- Bio: Zhisheng Zhou- Senior Software Engineer at LinkedIn anti-abuse team. He has worked on the nealine infrastructure platform based on Samza beam.
