We're excited to host Sean Owen! Sean is Director of Data Science at Cloudera, based in London. Before Cloudera, he founded Myrrix Ltd, a company commercializing large-scale real-time recommender systems on Apache Hadoop. He has been a primary committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google.
Apache Spark as Cross-Over Hit for Data Science
Spark is getting a lot of buzz lately. It's now a top-level Apache project, and is integrated now with Hadoop. Its programming model is attractively simple compared to MapReduce, and for iterative computations often found in machine learning, can be much faster.
It has the potential to be a 'cross-over' platform for both
exploratory and operational data scientists, and offers elements familiar to Java, Hadoop, R, and Python developers.
This talk will briefly introduce Spark, and the features that make it attractive to different data science "camps".