Lessons learned in building a Spark distribution


Details
Apache Spark is a next-generation tool that makes ground-breaking improvements over MapReduce on Hadoop. It lets developers be more productive and use their resources more efficiently. But it's less known that the authors of Mesos introduced it in their paper as a framework "to validate their hypothesis".
We built a distribution of Spark at Typesafe, exploiting the synergy with Mesos. We will show how the combination is ready for multi-tenant, heterogeneous cluster environments. We'll see how they fit into a larger picture, along side YARN or containers. And we'll have a look at enterprise use cases that drove some of our choice. Finally, we'll also mention the bumps on the road, and improvements we would like to see to make the Spark and Mesos combination even more versatile and powerful.
---
About the Speaker
François Garillot joined Typesafe in 2012 after an early stint in research, where he spoke frequently at international conferences. He is now working in Typesafe's Spark team, leveraging his Scala knowledge to improve Spark's support for scalable machine learning and data science applications.
Based in Lausanne, he speaks at Swiss conferences and Scala user groups in Lyon and Paris. He recently spoke at Strata Hadoop Barcelona on how to make your next big data hackathon successful.

Lessons learned in building a Spark distribution