Inaugural Apache Beam meetup Warsaw: QA in Beam + Beam use-case + more!


Szczegóły
We want to invite you to join us for the 1st Beam meet up in Warsaw.
The meetup will take place on November 14th. We hope to be able to welcome you at the Polidea offices this time (https://www.polidea.com) !
Stay tuned for speakers and topics! (we'll have a SQL deep dive and a customer use-case)
=======
Agenda
18:00 - Registrations
18:30 - Kick-off
18:35 - 1st talk: Beam intro & use-cases
19:00 - 2nd talk: Quality assurance in Beam—Measure Your Pipeline!
19:20 - 3rd talk: Beam for ML with TensorFlow Extended
19:45 - Pizza, drinks and networking
=====
Talks
1st talk
In this session, Matthias Baetens (https://twitter.com/matthiasbaetens) will introduce Apache Beam and cover some of the core concepts of the project.
We will cover several use-cases and deep dive into a use case by Sky, and present why and how Sky moved from an on-premise, batch-focused Hadoop based analytical system, to a streaming-oriented analytics pipeline hosted in a cloud environment utilising Apache Beam.
We will explain why capturing all user interaction data is important, and how it is part of Sky’s renewed strategy in data-driven decision making and what impact it has had for the business.
The architecture of the system will also be covered, including the design decisions made along the way, as well as learnings from the implementation and deployment process. We will end the session with a little demo and code example!
2nd talk
In the second talk, Łukasz Gajowy will talk about QA in Beam. Łukasz Gajowy is an engineer interested in distributed processing and Open-source software development. He got into both topics so badly that recently he’s become an Apache Beam committer. Other than that Łukasz works at Polidea, received an MSc in Information Technology at Warsaw University of Technology, has 7 years of professional experience (mostly in JVM areas) and enjoys jogging in his free time.
Apache Beam, being a Unified Model, supports multiple Runners and SDKs. You want it to work like a charm on multiple file systems and be sure that any IO is fine with big loads. Other than that, you need your jobs to be efficient (and you probably want to know "all the numbers"). How to ensure you meet all your goals? You need proper testing! However, due to the complex nature of the problem itself, automated testing can sometimes be tricky too!
Luckily, we're working on a whole bunch of tools in Beam's codebase to ease your pain. In this presentation, we will show you how you can write various integration/performance tests in Beam using an auto-setup infrastructure (DBs, filesystems & runners), and how to display collected metrics. We will also show you how do we currently do it in Beam in both IO Integration Tests and Load Tests of core Beam transforms.
Use these tools to increase the quality of your Beam pipelines!
3rd talk
For this talk, Vojin will speak about TensorFlow Extended and role of Apache Beam in it.
TensorFlow Extended (TFX) is an end-to-end ML pipeline for TensorFlow.
Many of TFX components rely on Apache Beam for data processing.
The talk is an overview of both systems and their synergy.
Vojin is a software engineer in Cloud Dataflow, leading security and privacy efforts on the product.
Google is always innovating on Cloud Dataflow's security model and world-scale infrastructure to help keep your organization secure and compliant. In his free time he is an xylophonist in a boy band
focused on promoting values of the post-toddler era.
==================
Who should attend
Everyone interested in Data Engineering, Data Science and Machine Learning, who wants to learn about one of the newer and exciting Apache projects focused on batch & stream processing of data. We try to cover both business value as well as digging deeper technically.
=========
Sponsors
Thanks to Polidea for providing the space, food & drinks at this meet up.

Inaugural Apache Beam meetup Warsaw: QA in Beam + Beam use-case + more!