Log-Based Architecture for Distributed Systems & Running Postgres at Scale


Details
Schedule:
6:00 - Doors & Food
6:30 - Talk 1
7:15 - Talk 2
8:00 - Wrap & Chat
Talk 1:Polylog: A Log-Based Architecture for Distributed Systems
Speaker:
Morrigan Jones, Principal Engineer @ JW Player, working on data streams and architecture;
Eric Weaver, Software Engineer @ JW Player, focused on productionizing machine learning algorithms
Abstract:
JW Player is the world's largest network-independent video platform representing over 5 percent of global internet video.
The talk will focus on a log-based architecture ("The Polylog") we've developed to handle data change capture in order to easily build new services and databases based on other service's full datasets. Some of the tools we'll cover include Debezium for database change capture, Kafka for storing the logs, and the Denormalizer, which is an in-house tool we built to do left joins on streams.
Potential use cases of the Polylog include:
- Using logs as a primary datastore
- Syncing upstream databases with document stores like Elasticsearch
- Database migrations for breaking up monoliths
- Denormalizing records across topics in a streaming fashion
- Monitoring data changes
- Disaster recovery and fault tolerance
Talk 2: Cloud Architecture Patterns: Running Postgres at Scale (when RDS won’t do what you need)
Speaker:
Corey Huinker, Database Consultant @ Paribus Co. & President @ Corlogic
Abstract:
One of the most vexing problems in the ad-tech world is measuring viewability. The amount of data involved can be overwhelming, as are the strategies for presenting that data to customers.
This talk covers the strategies employed by a prominent ad-tech company to collect, anonymize, and categorize that data, and to generate real-time reports for customers that could span months while including up-to-the-minute data. It also covers the strategies used to store that data, either in PostgreSQL itself or accessed via foreign data wrappers to S3, Vertica, and custom in-memory datastores.
This talk highlights the flexibility of PostgreSQL inside an AWS EC2 environment for solving a wide range of data problems including receiving 50 billion events per day, loading 500 million wide rows (over 100 columns) of summary data into multiple data systems, and unifying results from relational an non-relational systems.

Log-Based Architecture for Distributed Systems & Running Postgres at Scale