We will have a combined meetup with SF Bay Area Azure
Please sign up on their meetup page.
Cascading is a open source API for building Enterprise data workflows at scale, integrating Apache Hadoop with other data frameworks. Its pattern language for data pipelines provides a foundation for popular DSLs based in functional programming languages, such as Cascalog and Scalding.
The HDInsight team at Microsoft has been working with Cascading and Scalding apps on Azure. Also, the recent release of Hortonworks HDP for Windows now provides Hadoop on Windows Made Easy. Concurrent, the team behind Cascading, has partnered with both Microsoft and Hortonworks to help bring a powerful abstraction layer for Enterprise data workflows to the convenience and versatility of HDInsight and HDP.
Recent work has also added ANSI SQL and PMML as additional languages atop Cascading. Now people who have backgrounds working with SQL data warehouses or analytics frameworks such as R, Weka, SAS, SPSS, etc., can build large-scale apps to run on Hadoop just as well as developers working in Java, Clojure, Scale, etc.
While a typical Enterprise workflow crosses through multiple departments and frameworks -- perhaps SQL for ETL, perhaps J2EE for business logic and data prep, perhaps SAS for predictive models -- Cascading allows multiple departments to integrate their workflow components into one app, one JAR file. This talk will show (1) using R and SQL on a laptop to define a complex app, then (2) using to Cascading to integrate those components into a single JAR file which runs on a Hadoop cluster in parallel at scale.