Building Enterprise Apps for Big Data with Cascading

For our October Meetup, we're thrilled to have Paco Nathan talking about his experiences working with and deploying enterprise-scale predictive systems. Cascading, the open-source application framework that Paco specializes in, is a wrapper around Hadoop, so we're thrilled to be partnering with Hadoop DC this month!

Notes: We're back at newBrandAnalytics for this event! And we'll be continuing our experiment with informal pre-event themed networking -- please come early to meet and chat with people interested in, or perhaps immersed in, startup businesses with an analytics or data science focus, and continue the conversation afterwards at Data Drinks.

Agenda:

  • 6:30pm -- Networking and Refreshments (Discussion theme: Startups)
  • 7:00pm -- Introduction
  • 7:15pm -- Paco's presentation and Q&A
  • 8:30pm -- Post presentation conversations
  • 8:45pm -- Adjourn for Data Drinks
    • Happy Hour Prices
    • & Our own floor
    • @ Science Club DC (19th btwn L&M)

Abstract:

Cascading is an open source project which provides an abstraction layer on top of Hadoop and other compute frameworks for Big Data apps. The API provides workflow orchestration for defining complex apps, and is particularly well-suited for Enterprise IT. Large deployments run at Twitter, Etsy, Climate Corp, Trulia, AirBnB, and many other firms, based on the Java API or alternatively using DSLs in Scala (Scalding) and Clojure (Cascalog), as well as other JVM-based languages.

This talk will review some of the speaker's experiences leading Data teams for large-scale deployments of predictive analytics, and how those learnings have led into trade-offs and best practices which we use in Cascading. We will discuss use cases and architectural patterns for large MapReduce workflows, when robustness and predictability are high priorities. We will also review a sample recommender application (on GitHub) based on government Open Data.

Bio:

Paco Nathan is a Data Scientist at Concurrent in SF and a committer on the Cascading.org open source project. He has expertise in Hadoop, R, AWS, machine learning, predictive analytics, and NLP -- with 25+ years in the tech industry overall, in a range of Enterprise and Consumer Internet firms. For the past 10 years Paco has led innovative Data teams, deploying Big Data apps based on Cascading, Hadoop, HBase, Hive, Lucene, Redis, and related technologies.

Join or login to comment.

  • Madhu

    Great presentation!

    October 19, 2012

  • Geoff M.

    Great event! Paco Nathan gave a great presentation, very informative.

    October 18, 2012

  • Paco N.

    Thank you very much for the opportunity to present at Data Science DC and Hadoop DC. Wonderful discussions! I really appreciated getting to meet many people involved in Data here in the DC area. Just posted the slide deck for tonight's talk on SlideShare: http://www.slideshare.net/pacoi...­

    1 · October 18, 2012

  • nahumg

    I hate to ask it, could somebody tell me what is the difference between data science and data engineering. I am confused...

    October 17, 2012

Our Sponsors

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more
Henry

I decided to start Reno Motorcycle Riders Group because I wanted to be part of a group of people who enjoyed my passion... I was excited and nervous. Our group has grown by leaps and bounds. I never thought it would be this big.

Henry, started Reno Motorcycle Riders

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy