Real-time Data Warehouse + Beam In Production -- Lessons from Sentry
Details
"Building a flexible, realtime data warehouse at Sentry with Beam + Dataflow"
Presented by: Syd Ryan
Syd will describe two hard problems they've solved at Sentry with streaming Beam pipelines. The first solution combines Postgres change data capture and SQL views to produce a table that appears to be updating in real time within BigQuery. The second solution is aggregating 1000s of events per second and backfilling historical data effectively with Beam's unified batch/streaming interfaces.
"Beam in Production: Lessons learned and best practices"
Presented by: Mike Clarke
Mike will describe gotchas and early struggles Sentry hit moving streaming data pipelines off our laptops and into production. He'll cover some unexpected Beam defaults, detecting schema errors, compare performance between the python & java SDK, and proactively identifying when production pipelines break due to unexpected data.
Speakers
Syd Ryan is a data engineer at Sentry, an open-source error monitoring tool that helps developers ship better software, faster. Most recently they have been replacing batch ETL jobs with streaming data pipelines - because fast data is better than slow data.
Mike Clarke is an engineering manager at Sentry (we're hiring!). Mike's passion is bringing Sentry's monitoring solutions to data engineers & data scientists. Connect with Mike and leverage Sentry on your next project.
Schedule
6:00pm arrive, eat, drink, be merry!
6:30pm talks begin.
Networking/etc to follow.
Food and Beverage
Updated 2/18/2020
Homeroom will be catering the event tomorrow, with options ranging from classic mac & cheese to buffalo chicken mac & cheese with vegan and gluten free alternatives available as well. We will also have beer, wine, and various sparkling waters & sodas. Bring an appetite for comfort food!
