Past Meetup

Stacksgiving - The Time (Series Data) of Your Life

This Meetup is past

96 people went

Location visible to members

Details

"Stacksgiving" will be a night focused on work with time series data. Queue some Green Day, because I hope you have the time (series data) of your life! #dadjoke

We'll have some Thanksgiving-themed eats available. The cranberry sauce will be flowing and delicious side dishes plenty.

Presentation #1: Time Series on a Time Crunch (~25 mins)

Fiona Condon, Search Engineer @ GIPHY

Fiona Condon is an engineer on GIPHY's Search and Discovery team, working to help you find the best GIFs. Before GIPHY, she worked on search ranking at Etsy, helping you find the best gifts. She co-hosts a weekly online radio show out of a shipping container in Bushwick.

Talk Abstract:

Designing new infrastructure at scale is a challenge—doing it on a tight schedule is plain hard. Architecting to avoid operational surprises and building for the right kind of flexibility requires a combination of technical pragmatism and effective human communication.

Using GIPHY’s user analytics launch as a case study, this talk will cover some best principles for engineering low-risk time series indexes in Elasticsearch for uncertain load, and detail how we planned for foolproof backfills to adapt to changing requirements. I’ll also share some learnings from our effective short-term cross-team collaboration.

Presentation #2: TimescaleDB: Re-Imagining PostgreSQL for Time-Series Data (~35mins)

Mat Arye, Software Developer @ ‎TimescaleDB

Mat has been working on data infrastructure in both academia and industry. As one of TimescaleDB's core architects he works on performance, scalability, and query power. Previously, he attended Stuyvesant, The Cooper Union, and Princeton.

Talk Abstract:

Today everything is instrumented, generating more and more time-series data streams that need to be monitored and analyzed. When it comes to storing this data, many developers often start with some well-trusted system like PostgreSQL, but as their data hits a certain scale, give up its query power and ecosystem by migrating to some NoSQL or other "modern" time-series architecture. They face the traditional trade-off: query power or scale.

This perceived trade-off isn't necessary. We leverage the nature of time-series workloads -- inserting new data about recent events and rarely making updates -- to scale PostgreSQL for time-series data. This is achieved by automatically partitioning data. However, the user does not need to worry about this partitioning and can use all-of-SQL (e.g., secondary indexes, rich query predicates and group bys, aggregations, windowing functions, CTEs, JOINs).

I’ll present performance benchmarks that show TimescaleDB scales much better than PostgreSQL for time-series workloads involving billions of row, even on a single node. TimescaleDB is a PostgreSQL extension (Apache 2 license).