align-toparrow-leftarrow-rightbackbellblockcalendarcamerachatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-crosscrosseditfacebookglobegoogleimagesinstagramlocation-pinmagnifying-glassmailmoremuplabelShape 3 + Rectangle 1outlookpersonplusImported LayersImported LayersImported Layersshieldstartwitteryahoo

Enterprise Data Workflows with Cascading and Windows Azure HDInsight

We will have a combined meetup with SF Bay Area Azure

Please sign up on their meetup page.



Cascading is a open source API for building Enterprise data workflows at scale, integrating Apache Hadoop with other data frameworks. Its pattern language for data pipelines provides a foundation for popular DSLs based in functional programming languages, such as Cascalog and Scalding.

The HDInsight team at Microsoft has been working with Cascading and Scalding apps on Azure. Also, the recent release of Hortonworks HDP for Windows now provides Hadoop on Windows Made Easy. Concurrent, the team behind Cascading, has partnered with both Microsoft and Hortonworks to help bring a powerful abstraction layer for Enterprise data workflows to the convenience and versatility of HDInsight and HDP.

Recent work has also added ANSI SQL and PMML as additional languages atop Cascading. Now people who have backgrounds working with SQL data warehouses or analytics frameworks such as R, Weka, SAS, SPSS, etc., can build large-scale apps to run on Hadoop just as well as developers working in Java, Clojure, Scale, etc.

While a typical Enterprise workflow crosses through multiple departments and frameworks -- perhaps SQL for ETL, perhaps J2EE for business logic and data prep, perhaps SAS for predictive models -- Cascading allows multiple departments to integrate their workflow components into one app, one JAR file. This talk will show (1) using R and SQL on a laptop to define a complex app, then (2) using to Cascading to integrate those components into a single JAR file which runs on a Hadoop cluster in parallel at scale.

Join or login to comment.

2 went

Our Sponsors

  • Concurrent, Inc.

    Good food, great beer, and interesting people at meetups. Always.

  • O'Reilly Media

    Buy ebooks 50% off, print 40% off - when you purchase on

  • The Climate Corporation

    HQ conveniently located in SOMA, plus food & drink, Cascalog experts.

  • BlueKai

    Awesome venue, food & drinks, excellent use cases for Big Data

  • Twitter

    Meeting space and great vibes, plus Scalding, Cascalog, PyCascading...

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy