Züri ML #30: Large Scale Data in Practice


Machine Learning at Scale: Working at the Interface between System and Algorithm.

Celestine Dünner, IBM

Abstract: In machine learning the design of the hardware system on one hand and the learning algorithm on the other hand is often conducted independently. As a consequence, in practice, available resources may not be efficiently utilized by the algorithm, which can drastically degrade its performance. In this talk I will focus on the challenge of designing algorithms that run efficiently on large scale machine learning systems. I will present results of a recent performance study of Spark and MPI with the goal to illustrate that the optimal parameters of a distributed learning algorithm highly depend on the characteristics of the system as well as the software framework it is implemented on.

Reference: https://arxiv.org/pdf/1612.01437v1.pdf

Applying Machine Learning on Healthcare Data

Diego Saldana Miranda, Novartis

The diversity of data sources that need to be analyzed in healthcare and outcomes research is growing more and more. However, analyzing such diverse data sources is a non-trivial task, requiring new approaches in data preparation, modelling, and reporting results. The size of some of these datasets also means that specialized infrastructure, able to cope with such data sources is required. In this presentation, we will introduce the audience to the diversity of datasets that are increasingly found in healthcare. We will cover topics such as randomized clinical trial, health insurance claims, as well as smart device data. How their usage differs from one another, what are some of the machine learning methods that can be applied today and what are the perspectives for the future.