This event will feature two talks: we will first announce the Spark 0.8 release, followed by a use case talk from Bizo. We'd like to thank Tagged for hosting the event.
Spark 0.8 release
Spark 0.8 is the biggest Spark release yet, as well as our first under Apache. With 67 developers and 20 companies contributing, this release adds a slew of new features. To make debugging and productionizing Spark jobs easier, we have a new monitoring UI and metrics infrastructure. To expand Spark's out-of-the-box capabilities, 0.8 adds MLlib, a standard library of high-quality machine learning algorithms. For Python users, PySpark has been greatly expanded to bring it near feature-parity with Scala, and now supports IPython and Windows. And for deployability, Spark 0.8 includes much-improved support for YARN, new EC2 scripts, and simpler packaging. This talk will give a tour of these and other new features.
Spark at Bizo
Bizo allows marketers target display campaigns to specific business demographic audiences, eg. people in finance or medicine; basically we help marketers get in front of the "right people". Part of the tooling we provide to customers is a reporting platform that has all kinds of fun shiny funnel charts stuffed with pretty metrics (seriously marketers love funnels even more than developers love free pizza). Recently we had to build some new reports that allow users to compare behavior of their website visitors based on whether or not visitors have been exposed to one of our display ads. This was a perfect opportunity for us to test out using Spark in production as it involved processing a fairly large amount of log data from multiple sources on a nightly basis.
This talk will walk through how we're using Spark in production today on Amazon's EMR service. I'll cover how we've setup our installation & deployment, how we structure our Spark jobs for easy unit testing, plus talk about how we put together a successful Spark hackday to get other engineering sub-teams at Bizo excited about using Spark. Finally I'll cover some common-pitfalls & caveats we've encountered - especially with regards of translating some of our older Hive jobs to Spark & how we go about debugging failed Spark jobs.
Doors open at 6:30, with talks starting at 7.