Apache Beam meetup 3: Beam portability + visual pipeline development + ML w Beam

Are you going?

15 spots left

Location image of event venue


We want to invite you to join us for the 3rd Beam meet up in Stockholm, to get an update on what is going on in the land of streaming analytics, learn how it is used by your peers, discuss and have your questions answered.

We hope to be able to welcome you at the Google this time (https://cloud.google.com) offices!

18:30 - Registrations, pizza and drinks.

19:00 - 1st talk: Beam in-depth: Beam portability and cross-language pipelines by Robert Bradshaw.

19:45 - 2nd talk: Visual Beam Pipeline Development with Kettle by Matt Casters.

20:30 - 3rd talk: End-to-End ML pipelines with Beam, Flink, TensorFlow, and Hopsworks by Theofilos Kakantousis.

21:00 - 4th talk: An update on Python 3 support in Apache Beam by
Valentyn Tymofieiev.

21:15 - Networking.


1st talk
Robert Bradshaw (https://www.linkedin.com/in/robert-bradshaw-1b48a07), a senior software engineer at Google, one of the original authors of the FlumeJava and Dataflow paper (https://ai.google/research/people/RobertBradshaw) will go deeper into the technicalities of Beam. Specifically, he will talk about Beam portability and cross-language pipelines.

2nd talk
Matt Casters will demonstrate how Kettle integrated with Apache Beam to allow you to visually build pipelines that can be executed on the various Beam runners without the need to write any code.
After an introduction on the Kettle project you will get an overview of the supported functionalities and best practices with a number of demos on DataFlow, Spark and Flink.

3rd talk
Theofilos Kakantousis, COO at LogicalClocks, will talk about ML with Beam.
Apache Beam is a key technology for building scalable End-to-End ML pipelines, as it is the data preparation and model analysis engine for TensorFlow Extended (TFX), a framework for horizontally scalable Machine Learning (ML) pipelines based on TensorFlow.
In this talk, we present TFX on Hopsworks, a fully open-source platform for running TFX pipelines on any cloud or on-premise. Hopsworks is a project-based multi-tenant platform for both data parallel programming and horizontally scalable machine learning pipelines. Hopsworks supports Apache Flink as a runner for Beam jobs and TFX pipelines are supported through Airflow support in Hopsworks.
We will demonstrate how to build a ML pipeline with TFX, Beam’s Python API and the Flink Runner by using Jupyter notebooks, explain how security is transparently enabled with short-lived TLS certificates, and go through all the pipeline steps, from Data Validation, to Transformation, Model training with TensorFlow, Model Analysis, Model Serving and Monitoring with Kubernetes.

4th talk
Valentyn is a software engineer in Google Cloud Platform and a contributor to Apache Beam. Most recently, Valentyn has been coordinating the ongoing work to offer Python 3 support in Beam and will give an update on the current status of this effort.

Who should attend
Everyone interested in Data Engineering, Data Science and Machine Learning, who wants to learn about one of the newer and exciting Apache projects focused on batch & stream processing of data. We try to cover both business value as well as digging deeper technically.

Thanks to Google (https://cloud.google.com) for providing the space. Thanks to EQT (https://www.eqtpartners.com) for sponsoring the meetup.