Lessons learned while building a petabyte-scale data infrastructure


Details
Julianna Gobolos-Szabo, Zoltan Toth (Prezi Data team): Lessons learned while building a petabyte-scale data infrastructure
Back in 2011 at Prezi we started off with a single SQL query that worked on a few megabytes of data and produced somewhat accurate numbers satisfying basic business needs. This used to be our BI platform. Today we run a data infrastructure with around 70 high-performance servers that crunch hundreds of gigabytes of data and feed hundreds of reports day by day. Along this journey we used standard Unix and statistical software, later on-premise Hadoop clusters, NoSQL databases and third-party BI tools. Learning from our mistakes we rebuilt our data infrastructure and ETL systems many times. We’ll share the successes and misses we encountered throughout this journey with a special focus on our current experiences with managed solutions such as Amazon’s Elastic MapReduce Hadoop solution and Redshift, Amazon’s hosted data warehouse solution.
■ ■ ■
Schedule
6:15pm - Doors open
7:00pm - Doors close, presentation starts
8:30pm - Lights off
Drinks and snacks are provided by Prezi.

Lessons learned while building a petabyte-scale data infrastructure