Joe Crobak from Foursquare will give a brief overview of how a workflow engine fits into a standard Hadoop-based analytics stack, and an architectural overview of Azkaban, Luigi, and Oozie. He will elaborate on some features, tools, and best practices that will help you build out a Hadoop workflow system from scratch or improve an existing one.
About the talk:
Building a reliable pipeline of data ingress, batch computation, and data egress with Hadoop can be a major challenge. Most folks start out with cron to manage workflows, but soon discover that doesn't scale past a handful of jobs. There are a number of open-source workflow engines with support for Hadoop, including Azkaban (from LinkedIn), Luigi (from Spotify), and Apache Oozie. Having deployed all three of these systems in production, Joe will talk about what features and qualities are important for a workflow system.
About the speaker:
Joe Crobak worked on Hadoop and analytics infrastructure at Foursquare, where he built internal tools and APIs used by dozens of engineers and analysts on a daily basis.
Etsy on Skyline