Cluster Computing with Cascading and Mesos


Details
We are very excited to have Paco Nathan (http://liber118.com/pxn/) come and talk to us about Cascading (http://www.cascading.org/) and Mesos (http://mesos.apache.org/). Both are very interesting technologies in the Hadoop ecosystem. Come and learn how to get more out of your Hadoop cluster!
Special thanks to Datalogix (http://www.datalogix.com/) for hosting the meetup. Datalogix is doing some very interesting things with Big Data. Interested? They're hiring. Check out their open positions at http://www.datalogix.com/careers/work-with-us/engineering/ .
Agenda
6:00 – 6:30 - Socialize over food and drink
6:30 – 6:45 - Announcements, Upcoming Events
6:45 – 8:30 - Cluster Computing with Cascading and Mesos by Paco Nathan
8:30 – ??? - Continued socializing
About the Speaker
Paco Nathan (http://liber118.com/pxn/) is a "player/coach" who has led innovative Data teams over the past decade, building large-scale apps. He is a recognized expert in Hadoop, R, cloud computing, distributed systems, machine learning, predictive analytics, and NLP. Paco is the Chief Scientist for Mesosphere in San Francisco, is a committer on the Cascading open source project, and is an O'Reilly author "Enterprise Data Workflows with Cascading". He received his BS Math Sciences and MS Computer Science degrees from Stanford, and has 25+ years experience in the tech industry ranging from Bell Labs to early-stage start-ups.
About the Presentation
Cascading is an open source workflow abstraction atop Hadoop and other Big Data frameworks, with a 5+ year history of large-scale Enterprise deployments. For example, half of Twitter's total compute uses this API, along with other large use cases at eBay, Etsy, Airbnb, LinkedIn, Apple, Climate, Nokia, Factual, Telefonica, etc. Cascading leverages some aspects of functional programming so that developers can create large-scale data pipelines which are robust and easier to operationalize. There are popular DSLs in Scala (Scalding) and Clojure (Cascalog), plus recent support implementing DSLs for ANSI SQL (Lingual) and PMML (Pattern), e.g., so that workloads can be migrated from Teradata and SAS onto Hadoop.
This talk will describe the technology and some of the large use cases for Cascading, plus show sample apps in Scalding, Cascalog, and go into examples of ANSI SQL with Lingual and PMML with Pattern. We'll also cover material about Mesos, a cluster scheduler akin to Google's "Borg" which is used at scale by Twitter, Airbnb, Box, etc.

Cluster Computing with Cascading and Mesos