Large-Scale Analytics with Apache Spark

Name: Large-Scale Analytics with Apache Spark
Start: 2014-09-22T19:00:00-05:00
End: 2014-09-22T22:00:00-05:00
Location: OWS-150 (Owens Science Hall), University of St. Thomas

Hosted by Brad R.

Twin Cities Spark and Hadoop User Group

Details

Abstract: Is Apache Spark the answer to all my big data problems? What distinguishes Spark from Hadoop? Do I have to become a Scala expert in order to use Spark? Can I do large-scale machine learning with Spark?
In this talk, I will answer these and other questions based on our experience at Thomson Reuters R&D with the MapReduce framework Spark. After a short introduction presenting the underlying technology, I will show how Spark can help with your data analysis tasks. I will discuss the various recent extensions including GraphX, SparkSQL, and in particular MLLib, the Spark library for machine learning (ML). The talk will conclude with a comparison between Spark's ML capabilities and other frameworks (e.g., Mahout, H2O).

Speaker: Frank Schilder, from Thomson Reuters, obtained his Ph.D. in Cognitive Science from the University of Edinburgh, Scotland. His research interests include discourse analysis, summarization and information extraction. His summarization work has been implemented as the snippet generator for search results of WestlawNext and he is currently involved in various large-scale machine learning projects. Frank has successfully participated in several research competitions on automatic summarization systems such as the Text Analysis Conference (TAC) carried out by the National Institute of Standards and Technology (NIST). Before joining Thomson Reuters, he was employed by the Department for Informatics at the University of Hamburg, Germany, as an assistant professor.

Parking: There are two options to pay for parking in the adjacent Anderson ramp. You can either enter/exit with a credit card, or you can take a ticket and use the pay kiosk on the northeast corner of the ramp to get an exit ticket.

Food: Pizza and drinks, first come first serve, starting at 6:30PM provided by Cloudera.

Map: http://bit.ly/RCtaTI

Twin Cities Spark and Hadoop User Group

Large-Scale Analytics with Apache Spark

Twin Cities Spark and Hadoop User Group

Details

Related topics

You may also like