Join us for the inaugural meetup for database-makers, on October 31st. More information to come, but this is our tentative program:
Agenda:
16:00 - 16:10 Doors open
16:10 - 16:15 Introduction
16:15 - 17:00 Talk 1 (including QnA): Rohit Nayak (Vitess) - Inside Vitess: Engineering Resilience at Scale
17:00 - 17:45 Talk 2: Teresa Lopes (Adyen) - Operational hazards of managing PostgreSQL DBs over 100TB
17:45 - 18:30 Food
18:30 - 19:15 Talk 3: Fabian Groffen (Oracle) - Adventures and Thoughts while implementing BigData formats from scratch
19:15 - 20:00 Talk 4: Peter Boncz (CWI) - Dutch Data Systems - a CWI perspective
20:00 - 21:00 Networking
Teresa Lopes (Adyen) - Operational hazards of managing PostgreSQL DBs over 100TB
How do you backup (and restore) a +100TB database? Well, maybe you don't. In this talk I will share the singularities I encountered when managing huge PostgreSQL databases, topics like backups, high availability challenges, how to keep vacuum under control…
When reading blog articles, the best practices, the "how to" guides, things seem straightforward, but when you start bending PostgreSQL limits, you will end up needing to question the most fundamental assumptions about how PostgreSQL works.
Over the last years, my team has been exploring the boundaries of what PostgreSQL can do and today I will share our findings with you (at least the ones I can!).
Teresa has over 8 years of experience with databases, starting as an Oracle DBA before discovering her passion for PostgreSQL—drawn in by its extensibility and vibrant community. Now part of the PGDay Lowlands organizing team, she brings a unique background in civil engineering and a love for hiking, geology, and cooking.
Rohit Nayak (Vitess) - Inside Vitess: Engineering Resilience at Scale
Vitess, a database clustering system built on MySQL, powers some of the largest internet-scale platforms, including GitHub and Slack. This talk explores the architectural, design, and implementation choices and tradeoffs that distributed databases like Vitess must make to deliver high availability at massive scale under unpredictable failure conditions—and contrasts Vitess’s approach with other popular distributed databases.
Rohit Nayak is a software engineer with over three decades of experience in all aspects of software development especially with startups and in product development. He has been part of the core Vitess team at PlanetScale for the last six years and a Vitess Maintainer for five.
Fabian Groffen (Oracle) - Adventures and Thoughts while implementing BigData formats from scratch
When the interest to do analytics on (historical) data increased, systems like Hadoop and HDFS were developed.
Large pools of cheap storage disks allowed lots of storage with reasonable performance. To further the performance, text-based formats like CSV were replaced with binary, so called BigData, formats.
Today, these formats still stand, yet no longer on Hadoop, but in Data Lakes on Object Storage. And the formats are made de-facto by administration tools such as Apache Iceberg, which are getting increasingly common. Most database systems have support for reading
BigData from such systems, to be able to process that BigData as if it
were internal tables.
In this talk, Fabian will briefly explain how the BigData formats Parquet, ORC and Avro are working and what notable key differences they have. He does this from a database point of view, when having to implement the formats from scratch, as opposed to grabbing some existing library off the shelf. Unavoidably, there will be remarks, frustrations and opinions regarding the formats raised.
Fabian is writing software for a long time. During his time at CWI he
focussed on the C-language, and stuck to it ever since. As Open Source
contributor, he can be found in a very diverse set of projects. Most
notably he has been a Gentoo Linux developer since 2004 en still enjoys
the compiling that comes along with it -- next to reading a good book
while waiting for it to finish.
Peter Boncz (CWI) - Dutch Data Systems - a CWI perspective
The database architectures group at CWI was created in 1985 already. When I joined this group in 1994 and started a PhD, there was however very little commercial activity in the area, and therefore few people were working on data systems engineering in Amsterdam. A lot has changed since then.
From the perspective of the CWI database architectures group, and its long history, this talk will discuss some of the highlights of Dutch Data Systems over the years: ideas, technologies, companies and of course the people involved. The talk will discuss the ecosystem that has emerged and close with some thought on where we could be heading.