Spark is a programming model for performing large-scale data analysis in parallel, without focusing on the details of distributed computing; The same program you write for one computer will also work across many computers. Spark builds on the MapReduce framework by providing an interactive environment that has a more general set of functions for manipulating data efficiently in-memory. The result is a highly scalable way of quickly exploring large data sets interactively.
Spark has three APIs: Java, Scala, and Python. In this tutorial, I'll introduce the Spark framework and then illustrate how to use the Spark programming model with several IPython Notebook examples.
Hope to see you there.
We will be meeting in the TLC, room 215. The best place to park if you are driving is at the Euclid AutoPark. Please see map below.