Distributed Computing with Spark: Reza Zadeh, ICME program at Stanford

This is a past event

242 people went

Fireside Room at eBay Town Hall

2161 North First Street · San Jose, CA

How to find us

*** Notice the Location - North Campus! ***

Location image of event venue

Details

*** Notice the Location - North Campus! ***

As computer clusters scale up, data flow models such as MapReduce have emerged as a way to run fault-tolerant computations on commodity hardware. Unfortunately, MapReduce is limited in efficiency for many numerical algorithms. We show how new data flow engines, such as Apache Spark, enable much faster iterative and numerical computations, while keeping the scalability and fault-tolerance properties of MapReduce. In this tutorial, we will begin with an overview of data flow computing models and the commodity cluster environment in comparison with traditional HPC and message-passing environments. We will then introduce Spark and show how common numerical and machine learning algorithms have been implemented on it. We will cover both algorithmic ideas and a practical introduction to programming with Spark.

*** Notice the Location - North Campus! ***

Speaker Bio:

Reza Zadeh is a Consulting Professor of Computational Mathematics at Stanford, and technical Advisor at Databricks. He focuses on Discrete Applied Mathematics, Machine Learning Theory and Applications, and Large-Scale Distributed Computing. More information available on his website: http://stanford.edu/~rezab/

*** Notice the Location - North Campus! ***