Skip to content

Using Existing Math Libraries with Spark

Photo of Dean Wampler
Hosted By
Dean W.
Using Existing Math Libraries with Spark

Details

Brian Spector from the Numerical Algorithms Group (NAG) will be discussing using existing math libraries on Spark. Brian is a Technical Consultant at NAG where he has begun to successfully implement the NAG Library’s 1600 mathematical routines for Big Data applications. Brian will share the many pitfalls and successes he has had while using a numerical library in a distributed computing environment. Today’s mathematical algorithms require all relevant data to be in-memory at run time for efficiency. As this differs from Spark’s ecosystem, we must now rethink our algorithms for Big Data applications. As an example, we will review the simple linear regression problem and find that it is not so simple to run on hundreds of GBs of data. We’ll touch on the efficient algorithms for Big Data applications and the importance of scaling as you increase the number of worker nodes. Other topics covered include; starting a Spark ec2 instance and required steps to use existing libraries on Spark.

Photo of Chicago Spark Users group
Chicago Spark Users
See more events
Orbitz
500 W Madison St · Chicago, IL