Error rates across one of Facebook’s sites were spiking. The problem had first shown up through an automated alert triggered by an in-memory time-series database called Gorilla a few minutes after the problem started. One set of engineers mitigated the immediate issue. A second group set out to find the root cause. They fired up Facebook’s time series correlation engine built on top of Gorilla, and searched for metrics showing a correlation with the errors. This showed that copying a release binary to Facebook’s web servers (a routine event) caused an anomalous drop in memory used across the site
Open source version: https://github.com/facebookincubator/beringei
Lazy people paper summary: https://www.google.com/amp/s/blog.acolyer.org/2016/05/03/gorilla-a-fast-scalable-in-memory-time-series-database/amp/
Link to the paper: http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
Link to slides: https://goo.gl/1dXMxt
Food (veg and gluten-free included) and drinks provided!