Using Existing Math Libraries with Spark


Details
Brian Spector from the Numerical Algorithms Group (NAG) will be discussing using existing math libraries on Spark. Brian is a Technical Consultant at NAG where he has begun to successfully implement the NAG Library’s 1600 mathematical routines for Big Data applications. Brian will share the many pitfalls and successes he has had while using a numerical library in a distributed computing environment. Today’s mathematical algorithms require all relevant data to be in-memory at run time for efficiency. As this differs from Spark’s ecosystem, we must now rethink our algorithms for Big Data applications. As an example, we will review the simple linear regression problem and find that it is not so simple to run on hundreds of GBs of data. We’ll touch on the efficient algorithms for Big Data applications and the importance of scaling as you increase the number of worker nodes. Other topics covered include; starting a Spark ec2 instance and required steps to use existing libraries on Spark.

Using Existing Math Libraries with Spark