Past Meetup

Lies, damned lies, and statistics

This Meetup is past

42 people went

Znany Lekarz Sp. z o.o.

Kolejowa 5 · Warsaw

How to find us

https://goo.gl/CPQzWS Powyżej znajdziecie mapkę dojazdu, tak abyście bez problemu mogli trafić. Budynek zaznaczony prostokątem to siedziba ZnanegoLekarza.

Location image of event venue

Details

Serdecznie dziękujemy firmie ZnanyLekarz https://www.znanylekarz.pl za udostępnienie (po raz kolejny) świetnego miejsca do spotkania w centrum Warszawy. Podziękowania również dla firmy EnterpriseDB https://www.enterprisedb.com, za pyszną pizzę :)

== Prezentacje ==
=== 18:00 Lies, damned lies, and statistics ===
A journey into the PostgreSQL statistics subsystem

Before executing an SQL query, Postgres needs to decide on an execution plan. While there are multiple steps involved in generating a plan, this talk will focus on the main source of inputs for the query planner machinery, namely the statistics subsystem.

Detecting problems with statistics is often a crucial step towards finding the reasons for bad plans and slow queries, so it's important to understand exactly what gets tracked and how it's getting used.

As the universal rule of Garbage In, Garbage Out teaches us, without some degree of knowledge about the shape of queried data, it is impossible to get a reasonable execution plan. PostgreSQL employs a number of ways to maintain up-to-date statistics about table sizes and the distribution of data inside them, which in turn inform the query planner.

In this talk we'll explore what statistical information is being tracked by Postgres, how is it calculated, where does the server store it and how can the operator query it, and finally how to tweak the whole system.

Maintaining up-to-date statistics needs to balance performance overhead with information quality. For a large database, you can't just read everything and calculate some ratios, so the system employs a number of clever algorithms, which we will examine.

Another concern is that some data types require specific statistical information that's specific to the query pattern for the given type. A column holding an array is less likely to be queried for exact matches, but it is often queried using a "contains" operator. We'll cover special cases such as arrays, full text search vectors and ranges, and we'll talk more in depth about how statistics for them are gathered.

Finally, we'll look at some of the weaknesses of the statistics subsystem and areas where it could be still improved.

Prelegent:
Jan Urbański is a PostgreSQL enthusiast and hacker.

His main areas of interest are PL/Python and the query optimiser.

Jan has been involved in writing some of the new features of the PL/PythonU procedural language, explored estimating query progress in runtime, contributes to the psycopg2 driver and maintains an asynchronous Postgres driver for Twisted called txpostgres.

Before that, he worked on alternative approaches to the genetic query optimiser (GEQO) for his Master's Thesis and text search selectivity estimation during Google Summer of Code. He's also gotten some code into GStreamer, an open source multimedia framework, and contributed a Twisted connector for the popular Python AMQP library, Pika.

He's currently Lead Engineer at New Relic, where he divides his attention between Apache Cassandra, PostgreSQL and writing distributed systems in Java.

=== 19:00 Prezentacja nr 2 ===
Czekamy na potwierdzenie od prelegenta

Wystąpienia będą w języku polskim.
Tak, planujemy streaming, mam nadzieję, że w nieco lepszej jakości niż ostatnio. Testujemy :) Prezentacja również będzie dostępna do ściągnięcia, ale zapraszam (w miarę możliwości) do zjawiania się osobiście :)

Informacja o miejscach postojowych:
Niestety nie możemy Wam zaoferować niczego na terenie kompleksu biurowego. Miejsce o tej godzinie znajdziecie jednak bez problemu przy ul. Kolejowej oraz Sławińskiej.

Po wystąpieniu planowana jest integracja w pobliskim barze :)