19:00 Doors open & Get-Together
19:30-20:00 Emiliano Tomaselli, Olgierd Grodzki: Streaming Data Platform with Kafka and Kubernetes
20:00-20:25 Andreas Pawlik: Data Pipelining and Deep Learning Pipelines with Hadoop
20:35-21:20 Alexey Kuntsevich: Probabilistic Data Structures and their use in AdTech
Emiliano Tomaselli, Olgierd Grodzki: Streaming Data Platform with Kafka and Kubernetes
During this session Emiliano Tomaselli and Olgierd Grodzki from Data Reply give a presentation on Kafka and Kubernetes. Kafka has become the defacto standard for building a streaming architecture. A lot of organizations want to run Kafka "as a service" - on premise or in the cloud - and use it to enable its developers to create Apps, Data Pipelines and more.
In order to make the platform deployment Scalable, Fault Tolerant and Cloud Native we decided to take advantage of one of the most popular open-source systems for orchestrating containerized applications: Kubernetes.
We will show you our use-cases developed at the customer side using those platforms, some challenges that we faced (eg. Security, Acls and more ) and how we tried to solve them by developing custom tools and applications.
To conclude we will also present one of the solutions we adopted to bring automation into the Kubernetes Ecosystem with the CI/CD Pipelines.
Andreas Pawlik: Data Pipelining and Deep Learning Pipelines with Hadoop
Deep Learning profits from training and validation on large amounts of data. I will outline the challenges with running Deep Learning workflows on large data sets, discuss how Hadoop/Mesos can help address these challenges and explore the benefits and limitations of the approach.