Skip to content

Scalding : A better way to write map reduce jobs.

Photo of Ellen Friedman
Hosted By
Ellen F. and 4 others
Scalding : A better way to write map reduce jobs.

Details

Cascading provides a much higher-level API in addition to MapReduce. Additionally, Cascading provides an abstraction that insulates the code from the underlying fabric and the data source type and protocol. It also gives us an integrated orchestration layer that allows us to build sophisticated sequences of jobs, and it provides rich out-of-the-box functionalities. MapReduce programmers have realized very quickly that much of the code that write is dedicated to some very basic things, such as preparing the sort keys and handling the comparisons for the sort. MapReduce is verbose! Even a simple word count task requires five classes and well over 100 lines of code.

Cascading is, in fact, a domain-specific language(DSL) for Hadoop that encapsulates map, reduce, partitioning, sorting, and analytical operations in a concise form. This DSL is written in a fluent style and this makes coding and understanding of the resulting code line much easier.

Scalding is an extension to Cascading that enables application development with Scala.

Twitter developed an abstraction on top of Cascading which itself is an abstraction layer on top of Apache Hadoop (Map Reduce).

You can write fault tolerant data processing flows on Hadoop Cluster.

Want to learn Scalding please join us at DFW Data science.

Looking forward to seeing you all again.

Warm Regards,

Alvin.

Photo of DFW Data Science group
DFW Data Science
See more events
Improving
5445 Legacy Dr, Plano, TX, Suite 100 · Frisco, TX