Big Data Meetup - BudapestData edition


Join us for a data-filled evening and meet the speakers, and attendees, and sponsors of the Budapest Data Forum conference ( This meetup is free, no conference ticket needed to attend!

The Data Career Night and Data Job Fair meetup on Thursday is here:

Speakers and talks:
1) Felipe Hoffa, developer advocate, Google:
Protecting sensitive data in huge datasets: Cloud tools you can use

2) Robin Moffat, Developer Advocate, Confluent:
Integrating Databases and Kafka : The How and The Why

3) Wojciech Biela, Product Development Director, Starburst:
Presto: SQL-on-Anything. Under the hood of the new Cost-Based Optimizer

18:00 Doors Open, refreshments
18:30 Talks begin
20:30 Talks end
21:00 Meetup finishes

Talk details:

1) Protecting sensitive data in huge datasets: Cloud tools you can use
Before releasing a public dataset, practitioners need to thread the balance between utility and protection of individuals. In this talk we'll move from theory to real-life while handling massive public datasets. We'll showcase newly available tools that help with PII detection, and bring concepts like k-anonymity and l-diversity to a practical realm.

Related research: "Considerations for Sensitive Data within Machine Learning Datasets" -

2) No More Silos: Integrating Databases and Apache Kafka

Companies new and old are all recognising the importance of a low-latency, scalable, fault-tolerant data backbone, in the form of the Apache Kafka® streaming platform. With Kafka, developers can integrate multiple sources and systems, which enables low latency analytics, event driven architectures and the population of multiple downstream systems.

In this talk we'll look at one of the most common integration requirements - connecting databases to Kafka. We'll consider the concept that all data is a stream of events, including that residing within a database. We'll look at why we'd want to stream data from a database, including driving applications in Kafka from events upstream. We'll discuss the different methods for connecting databases to Kafka, and the pros and cons of each. Techniques including Change-Data-Capture (CDC) and Kafka Connect will be covered, as well as an exploration of the power of KSQL for performing transformations such as joins on the inbound data.

Attendees of this talk will learn:

- That all data is event streams; databases are just a materialised view of a stream of events.
- The best ways to integrate databases with Kafka.
- Anti-patterns of which to be aware.
- The power of KSQL for transforming streams of data in Kafka.

3) Presto: SQL-on-Anything. Under the hood of the new Cost-Based Optimize

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. This talk will be delivered to you by Wojciech Biela and Grzegorz Kokosiński from Starburst, the enterprise Presto company, largest contributor to Presto outside of Facebook. During this presentation we will go through Presto fundamentals and gently introduce you to our latest addition to Presto: the Cost-Based Optimizer. We will go over the performance impact it has and talk about the CBO’s inner workings.

This is an English speaking event. Venue and catering provided by the Budapest Data Forum conference (