Cluster Computing with Cascading and Mesos

We are very excited to have Paco Nathan come and talk to us about Cascading and Mesos.  Both are very interesting technologies in the Hadoop ecosystem. Come and learn how to get more out of your Hadoop cluster!


Special thanks to Datalogix for hosting the meetup. Datalogix is doing some very interesting things with Big Data.  Interested?  They're hiring.  Check out their open positions at

6:00 – 6:30 - Socialize over food and drink
6:30 – 6:45 - Announcements, Upcoming Events
6:45 – 8:30 - Cluster Computing with Cascading and Mesos by Paco Nathan
8:30 – ??? - Continued socializing

About the Speaker

Paco Nathan is a "player/coach" who has led innovative Data teams over the past decade, building large-scale apps. He is a recognized expert in Hadoop, R, cloud computing, distributed systems, machine learning, predictive analytics, and NLP. Paco is the Chief Scientist for Mesosphere in San Francisco, is a committer on the Cascading open source project, and is an O'Reilly author "Enterprise Data Workflows with Cascading". He received his BS Math Sciences and MS Computer Science degrees from Stanford, and has 25+ years experience in the tech industry ranging from Bell Labs to early-stage start-ups.

About the Presentation

Cascading is an open source workflow abstraction atop Hadoop and other Big Data frameworks, with a 5+ year history of large-scale Enterprise deployments. For example, half of Twitter's total compute uses this API, along with other large use cases at eBay, Etsy, Airbnb, LinkedIn, Apple, Climate, Nokia, Factual, Telefonica, etc. Cascading leverages some aspects of functional programming so that developers can create large-scale data pipelines which are robust and easier to operationalize. There are popular DSLs in Scala (Scalding) and Clojure (Cascalog), plus recent support implementing DSLs for ANSI SQL (Lingual) and PMML (Pattern), e.g., so that workloads can be migrated from Teradata and SAS onto Hadoop.
This talk will describe the technology and some of the large use cases for Cascading, plus show sample apps in Scalding, Cascalog, and go into examples of ANSI SQL with Lingual and PMML with Pattern. We'll also cover material about Mesos, a cluster scheduler akin to Google's "Borg" which is used at scale by Twitter, Airbnb, Box, etc.

Join or login to comment.

  • Paco N.

    As a follow-up, there was more Mesos-related news today with the launch of a new service called "Elastic Mesos" atop AWS, to make it super easy to launch a Mesos cluster, then run Hadoop, Spark, Marathon, Chronos, etc.:

    Also, an article in TechCrunch about this:

    November 12, 2013

  • GaryM

    Paco Thank you for coming to the meetup and a giving a great presentation. It was very interesting and enlightening. cheers

    September 26, 2013

  • Paco N.

    Thank you very much for the opportunity to present at Boulder/Denver BigData! Wonderful questions and dialogue. I'm super impressed by the tech community here in Front Range.

    The "Intro to Data Science" workshop tomorrow is at Omni Interlocken, starting at 8:30am and we'll meet for drinks at the Tap Room there in the evening.

    Slides for tonight's talk are at

    and in general I have a newsletter sign-up and event calendar at

    Looking forward to coming back to Boulder soon!

    1 · September 25, 2013

    • raju b.

      tkanks for the great presentation and intro. looking forward to meeting again tomorrow.

      September 26, 2013

  • Joe M.

    Excellent presentation! Interesting topic I hope to explore more in the near future. Only criticism is that slides were eye charts.

    Are they available for download?

    September 25, 2013

  • Michael M.

    Loved the presentation, but got there late. Are slides available?

    September 25, 2013

  • Dan Y.

    Great presentation Paco, thank you.

    September 25, 2013

  • Kevin M.

    Hoping to get there tonight, but busy getting ready for travel immediately after the Data Science Introduction tomorrow, which I'm really excited about attending.

    1 · September 25, 2013

  • Paco N.

    Looking fwd to the meetup! More details:

    Apache Mesos and Cascading provide part of the foundation for how Twitter, Airbnb, and others follow Google's success with Data Center Computing.

    Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. In contrast to VMs, isolation is based on features of the modern kernel: Linux control groups (cgroups) and Solaris zones. If you're read about "Borg" or "Omega", this is similar. Along with Chronos (distributed "cron") and Marathon (distributed "upstart"), Mesos provides building blocks for complex data workflows that mix batch/dependency graphs with low-latency, highly-available, long-running services. Other recent integrations include Play and Docker.

    To put this in perspective, Mesos runs in production on about half of Twitter's overall compute -- so does Cascading. That's huge. Let's discuss recent work, review some demos, and chat about the emerging Berkeley stack.

    September 24, 2013

    • Michael M.

      Still a little wonky for me, but it did work eventually.

      September 23, 2013

    • Devon C.

      39° 53' 48.53", -105° 6' 37.16" :P

      September 23, 2013

  • George T.

    Looking forward to listening to Paco Nathan and joining the group.

    August 18, 2013

  • Dan Y.

    Really looking forward to this meet up, it's going to be great!

    July 25, 2013

Our Sponsors

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy