The Apache Beam meetup is for everyone who is interested in authoring big data processing pipelines (both batch and streaming) and this in a portable way on different runners with SDKs in different languages.
We want to invite you to join us for the 2nd Beam meet up in Paris.
We will have 3 speakers: an introduction on how to make big data pipelines with Apache Beam and Talend Pipeline Designer, a deep dive into portability, and a talk on running Python and Go pipelines on Spark using Beam. We hope to be able to welcome you at Xebia this time (https://xebia.fr) offices!
18:30 - Registrations, pizza and drinks.
19:00 - 1st talk: Introducing Talend Pipeline Designer: Building Big Data pipelines with Apache Beam.
19:30 - 2nd talk: Making the Beam vision real.
19:50 - 3rd talk: Apache Beam: Running Big Data Pipelines in Python and Go with Spark
Speaker: Abbass Marouni
Abstract: Talend Pipeline Designer builds on Apache Beam's unified abstraction for big data processing to provide a visual language for end users to easily construct and explore their big data pipelines for both batch and streaming workloads. In this talk, we discuss the challenges and benefits of building our product on the foundation of Apache Beam and conclude with a live demo.
Speaker: Ismaël Mejía
Abstract: Beam achieves execution system and language portability by relying on two concepts: (1) Runners translate Beam's model so it can be executed in existing systems like Apache Spark and Apache Flink and (2) an architecture of gRPC services, the portability framework, that represents the Beam model in a language agnostic way and coordinates the execution of pipelines in language specific environments (e.g. via containers).
Speaker: Kyle Weaver
Abstract: The most recent evolution of the Beam Spark runner supports pipelines written in different languages by relying on Beam's portability framework. We will show you how we accomplished this and how you can now execute Beam Python and Golang pipelines in Spark with Beam. This enables use cases like Tensorflow Extended, an end-to-end platform for data validation and transformation and ML model analysis that integrates with Beam and can now be run at scale in the open with Spark. Finally we will discuss ongoing work and some future plans for the Spark portable runner.
Who should attend
Everyone interested in Data Engineering, Data Science and Machine Learning, who wants to learn about one of the newer and exciting Apache projects focused on batch & stream processing of data. We try to cover both business value as well as digging deeper technically.
Thanks to Xebia (https://xebia.fr) for providing the space.