Jun 24, 2014 · 6:00 PM
Big data processing with Apache Hadoop, Spark, Storm and friends is all the rage right now. But getting started with one of these systems requires an enormous amount of infrastructure, and there are an overwhelming number of decisions to be made. Oftentimes you don't even know what kinds of questions you can or should be answering with your data.
As a first step, Joe will describe the types of problems that people typically solve with a data pipeline—things like A/B testing and data warehousing. Then, drawing from his personal experience of building data tools at Foursquare and a from-scratch data pipeline at a new startup, he'll highlight the key questions to ask and best practices you should implement to encourage success.
Joe Crobak is a software engineer building data infrastructure at Project Florida. His technical interests include distributed systems and all things Hadoop. Before Project Florida, he was working on data infrastructure for billions of checkins and hundreds of terabytes of log data at Foursquare. His professional hobbies include the hadoopweekly.com newsletter and occasional open-source contributions.