Join The Brussels Data Science Community, Spark Summit Europe (https://spark-summit.org/eu-2016/) attendees, and Spark ML and machine learning experts Nick Pentreath and Jean-Francois Puget for a talk on a Spark-based end-to-end machine learning system. A round of Apache Spark™ and machine learning lightening talks will follow.
Here is the Agenda:
18:50 Introduction & update on Data4Good Hackathon (www.denguehack.org)
— Philippe Van Impe, Founder, European Data Innovation Hub & Brussels Data Science Community.
— Berni Schiefer, IBM Fellow
19:00 Creating an end-to-end Recommender System with Spark ML
— Nick Pentreath, Principal Engineer at the IBM Spark Technology Center, Apache Spark PMC member, and author of Machine Learning with Spark.
— Jean-François Puget,Distinguished Engineer, Machine Learning and Optimization, IBM Analytics
There are many resources available for building basic recommendation models using Spark. But how does a practitioner go from the basics to creating an end-to-end machine learning system, including deployment and management of models for real-time serving? In this session, we will demonstrate how to build such a system based on Spark ML and Elasticsearch. In particular, we will focus on how to go from data ingestion to model training to real-time predictive system.
19:45 Lightening Talks
10-minute Spark and machine learning talks, including new projects from Belgium.
1. DeepLearning4J and Spark : Successes and Challenges
We'll embark on a tour of the DeepLearning4J architecture intermingled with applications, going over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries, and with an easy-access interface. Along the way, we'll point out the strategic few points where parallelization of computation across machines helps tremendously, and we'll give some insight on where Spark helps, and where Spark doesn't.
Presenter: François Garillot - Skymind
2. Telco data stream simulation, processing and visualization
Koen will discuss the development of a prototype for processing of data coming from cell towers, executed for a telco operator in the Middle East. The added difficulty was that the customer could not provide real data.In the end he developed a data generator in Scala/Akka, a data processor with Spark Streaming, and a visualization front-end with Node.js.
Presenter: Koen Dejonghe - Eurocontrol
3. Hyperparameter Optimization - when scikit-learn meets PySpark
Spark is not only useful, when you have big data problems. If you have a relatively small data set you might still have a big computational problem. One problem is the search for optimal parameters for ML algorithms.
Normally, a data scientist has a laptop with 4 cores (8 threads), that means it will take some time to perform a grid search …However, if you use Spark, then it opens the possibility to have the grid search taken out on a cluster with a higher degree of parallelism.
Presenter: Sven Hafeneger - IBM
4. A data scientist, a BI expert and a big data engineer walk into a bar: how 3 different worlds come together with Spark
Because of its general purpose nature, Spark is being used by a wide variety of data professionals, each with their own backgrounds. The data warehouse / data lake of a large organisation is a spot where those 3 worlds collide. We've experience the good, the bad and the ugly of those encounters first hand. In this lightning talk, we share what each group can learn from each other, how they can collaborate, and which are the recipes for disaster.
Presenter: Kris Peeters - Data Minded
5. Writing Spark applications, the easy way : how to focus on your data pipelines and forget about the rest - Pierre Borckmans - Real Impact Analytics
Even though Spark offers intuitive and high-level APIs, writing production-ready Spark data pipelines involves non-trivial challenges for data scientists without expert background in software development and devops matters. In this short talk, I'll present how we tackled these issues at Real Impact Analytics, by developing an intuitive framework for writing dataflows, offering convenient data exploration and testing facilities, while hiding devops-related complexity.
Presenter: Pierre Borckmans - Real Impact Analytics
6. A very brief introduction to extending Spark ML for custom models: Talk + Demo
Spark ML pipelines, inspired by sci-kit learn, have the potential to make our machine learning tasks much easier. This talk looks at how to extend Spark ML with your own custom model types when the built in options don't meet your needs.
Presenter: Holden Karau - Spark Technology Center, IBM
20:45 Wine, beer, sandwiches, chocolate, & conversation!