Past Meetup

Spark Project / Scalding

This Meetup is past

16 people went

Location image of event venue

Details

Agenda:

18.30 - 18.40: Opening

18.40 - 18.50: Introduction and Group Update

18.50 - 19.30: Introduction to Spark (Anwar Rizal)

19.35 - 20.15: Scalding

Update

The introduction to Spark will no longer include Incanter presentation. Apologize for the inconvenience.

Detail

After two very interesting presentations of Cascalog and Twitter Storm some times ago, we decide to continue the topics with two other Scala based tools: Spark and Scalding.

Spark (Anwar Rizal)

Spark is a cluster computing that is aimed to make the big data analytics to be done fast. You can compare it more to Hadoop than to Storm, because like Hadoop, it works more on offline processing (although the latest development of Stream Spark also addresses online processing). However, unlike Hadoop, Spark introduces two interesting concepts. The first is that the computation in Spark is not necessarily Map-Reduce, it goes beyond that. The use of familiar Scala collection construct like flatMap, map, filter make it easy to use and also more powerful. The second, Spark has in-memory cache to store the result of a computation. Using the in-memory cache, the Spark performance is improved compared to disk-based Hadoop approach.
Anwar Rizal will lead the session and will have a Spark introduction and some examples. And if we have a good Internet connection, why not with a live example ? As a bonus, he might show some Incanter to further process the data used in the session.
Link to Spark Project : http://www.spark-project.org/
Scalding (Mario Pastorelli)
Scalding is a Scala-based Hadoop query language that facilitates the writing of Map Reduce tasks. Scalding is built on top of Cascading, a layer on top of Hadoop that allows the definition of flow of map reduce tasks.
If you were in Cascalog presentation a couple of months ago, you can see Scalding as the Scala counter part of Cascalog.
Mario Pastorelli (Eurecom), who is part of a Bigfoot project on which Eurecom is one of the main actor, will present Scalding. He will give an example-based presentation, so it is helpful for us to understand quickly the Scalding concept.
Link to Scalding: https://github.com/twitter/scalding