Spark at Thomson Reuters and Project Tungsten


Details
Tonight we will have 2 talks, the first will be discussing Spark at Thomson Reuters, the second will a talk on Project Tungsten from Databricks. A detailed abstract is below. We will be filming the talk and posting it to the Apache Spark YouTube page.
Agenda:
6:30: Mingling
7-7:05: Intro's
7:05-8:15: Technical Talks
8:15: Mingling
Adam Baron, Director of Big Data Quantitative Research
StarMine, a Thomson Reuters brand, began using Hadoop in 2011 and built a home-grown quantitative finance research environment heavily leveraging MapReduce, Hive and Mahout. In 2014, they started using Spark with a strong reliance on Spark SQL for data manipulation and Spark MLlib for machine learning. StarMine has also dabbled in Sparkling Water for algorithms which are not yet available in Spark MLlib, such as Deep Learning. Adam will speak about the steps involved in going from raw text to a predictive quantitative finance model. He will highlight the technologies involved, share some Spark examples and give insight into how quants approach Big Data.
Josh Rosen, Spark Committer and Software Engineer at Databricks
Project Tungsten focuses on substantially improving the efficiency of memory and CPU for Spark applications, to push performance closer to the limits of modern hardware. In this talk, we will give an update on the Project Tungsten improvements included in Spark 1.5.0 and dive into some of the technical challenges we are solving.

Spark at Thomson Reuters and Project Tungsten