Data Engineering Meetup #5: Open Source Infrastructure


Details
This is the fifth instalment of our series of Data Engineering Meetups.
For this edition, we have two very exciting talks, as well as a panel with Data Artisans, Oracle, and Starburst, on a much-talked-about subject: should you run open source infrastructure yourself, use a cloud provider’s managed service, or a vendor’s managed service? Interested in data engineering? Join us, and share your story!
Schedule:
18:00 - 18:30 Doors Open: drinks, and discussions
18:30 - 19:00 Michal Gancarski, Zalando: Data Infrastructure as Code - building core data services in a small team
19:00 - 19:30 Wojciech Biela and Karol Sobczak, Starburst: Join Reordering in Presto's CBO
19:30 - 20:00 Break: food, drinks, and discussions
20:00 - 21:00 Panel discussion: Infrastructure in the Cloud: DIY, Vendor, or Cloud Provider
21:00 - 21:45 Networking
21:45 Event End
For more details on topics and speakers, please read below.
Title: Data Infrastructure as Code - building core data services in a small team
Speaker: Michal Gancarski, Data Engineer, Zalando
Abstract:
When a relatively small team is trusted with creating some of the core data services for a company employing over fifteen thousand people, the task may seem overwhelming, even if still possible.
This talk will focus on tools, patterns and, more importantly, organizational approaches that allow a single team develop and maintain multiple data services at a continuously growing scale.
Title: Join Reordering in Presto's CBO
Speakers: Wojciech Biela, Co-founder and Director of Product Development, Starburst; Karol Sobczak, Senior Software Engineer, Starburst
Abstract:
Presto is an open source distributed SQL engine allowing users to interactively query various data sources including Hadoop HDFS, object stores such as S3 and Azure Blobs, NoSQL stores like Cassandra, relational databases (MySQL, Postgres, SQLServer, etc), and even Kafka streams.
Last year we have been gradually contributing a long planned and desired by the community addition to the engine’s capabilities – the Cost-Based Optimizer for Presto, including a framework for modeling and calculating data statistics.
In this session we will briefly walk through the Presto fundamentals and then introduce you to our Cost-Based Optimizer’s concepts and its architecture. We will also share the motivating use cases behind this feature paired with the fantastic performance improvements that it brought to the users of Presto. The session will conclude by discussing possible future improvements in this area.
Panel: Infrastructure in the Cloud: DIY, Vendor, or Cloud Provider
Moderator: Kshitij Kumar, VP Data, Zalando
Participants: Stephan Ewan, CTO Ververica (formerly Data Artisans), Alibaba Group and Flink PMC; Torsten Boettjer, Product Manager IaaS, Oracle; Wojciech Biela, Co-founder and Director of Product Development, Starburst; Perry Krol, Head of System Engineering Central EMEA, Confluent.
***
Our Data Engineering Meetup is an event by engineers, for engineers. We aim for short, practical talks about experiences with data engineering at scale. We don’t do sales pitches, and instead focus on sharing experiences and lessons learned the hard way.
If you like to talk about your experience at our next Meetup, get in touch, we’d love to hear from you!
--------------------------------------
Zalando Code of Conduct:
Zalando is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, nationality, cultural background, religion or lack thereof. We do not tolerate harassment of attendees in any form. Offensive and sexual language and imagery is not welcome at our events. Participants violating these rules may be asked to leave at the discretion of the event organisers.

Data Engineering Meetup #5: Open Source Infrastructure