The Part Of PostgreSQL I Hate The Most: MVCC and How To Optimize It
Details
Postgres is awesome. It is currently the fourth most popular database (source: DB-Engines), and its popularity continues to grow as strong as ever due to several factors: its rich set of features, mixing object storage and relational capabilities; its extensibility; and its status as an open source project. But as much as we love PostgreSQL, there are certain aspects about it that are not that great. So instead of talking about the awesomeness of everyone's favorite elephant-themed DBMS, I want to discuss the one part about it that sucks: how PostgreSQL implements multi-versioning. Our research at Carnegie Mellon University and experience with optimizing PostgreSQL databases for our customers have shown that their design and implementation of MVCC can cause severe performance issues for some workloads, requiring effective database administration to achieve good performance.
In this talk, I will discuss the following: (1) the implementation of MVCC in Postgres and the issues caused by its design (e.g., write amplification, table bloat, autovacuum), (2) how to optimize Postgres to address those issues with domain knowledge and machine learning (e.g., heuristic-based and ML-based autovacuum tuning, identifying long-running transactions, etc.), (3) the internals of autovacuum, its potential challenges, and effective methods for tuning autovacuum (4) real-world examples of optimizations from our customers.
Pizza is provided, and we also have a user group lending library of books & papers related to PostgreSQL and Databases.
For directions (including where to park), look at this photo album:
Poster for the Spring 2024 Series of meetups:
