Lighting fires: Sparks and more
Speaker: Simon Elliston Ball
Abstract: Spark comes with many useful pieces. The ML libraries are great and all, but what can you do beyond Spark? When you need more than you get with the core, how can you best extend spark? In this talk, I’m going to introduce a range of libraries across geospatial and computer vision that make great use of Spark to scale, or which never even thought about being Spark components but get along like a house on fire. We’ll have some live computer vision, some serious add on data science, and hopefully a few more tools for your data science toolbox.
Bio: Simon is Principal Solutions Engineer at Hortonworks, where he helps customers solve problems with Spark, Hadoop, NiFi and all the animals in the zoo. He used to be Head of Big Data at Red Gate Software where he researched and built user tools for Big Data platforms. Before riding the elephant, he worked in the data intensive worlds of hedge funds and financial trading, ERP and e-Commerce, as well as designing and running nationwide networks and websites. These days his head is in Big Data and visualisation.
Identifying Returning Users
Speaker: Dani Sola
Abstract: Identifying returning users cross-device it's an important requirement in order to provide an optimal experience to our customers. In this talk we'll see how Spark enables us to do it both in near real-time using Spark Streaming and in batch mode using GraphX.
Bio: Dani is Data Architect and Development Lead at Simply Business, where he is helping to build a data platform on AWS which uses Spark extensively: PySpark, Spark Streaming, GraphX and experimenting with MLlib. Before that, he worked as big data engineer at Hotels.com and Trovit, processing and analyzing terabytes of data using Hadoop, Hive, Giraph and many other tools.