Skip to content

Details

********************************************************

Please note, all talks will be held in English
Please note, all talks will be highly technical and mainly relevant for Data Scientists
********************************************************
SCHEDULE
18:00-18:30 - Warm up: food, drinks and networking
18:30- 18:45 - First talk: Using target encoding to improve out-of-bag predictions
18:45-19:15 - Second talk: Productionizing H2O models with Apache Spark
19:15 - 19:30 - Short break
19:30-20:15 - Third talk: “Water you talking about“
20:15 - 20:30 - Closing: more mingling, coffee & sweets

TALKS

USING TARGET ENCODING IN H2O TO IMPROVE OUT-OF-BAG PREDICTIONS BY

Jo-fai (Joe) Chow | Data Science Evangelist & Community Manager at H2O.ai
https://www.linkedin.com/in/jofaichow/

Target encoding is a feature engineering technique that is commonly used by practitioners to improve prediction accuracy. It is the process of replacing a categorical value with the mean of the target variable. Yet, target encoding is also like a double-edged sword. If it is applied without care, it could lead to overfitting and therefore do more harm than good. In order to avoid overfitting, we (H2O.ai) have implemented different target encoding strategies in our open source machine learning H2O-3. In this talk, Joe will quickly go through the basics of target encoding and then illustrate the usage with an example.

PRODUCTIONIZING H2O MODELS WITH APACHE SPARK

Jakub Hava | Senior Software Engineer at H2O.ai
https://www.linkedin.com/in/havaj/

Spark pipelines represent a powerful concept to support productionizing machine learning workflows. Their API allows one to combine data processing with machine learning algorithms and opens opportunities for integration with various machine learning libraries. However, to benefit from the power of pipelines, their users need to have a freedom to choose and experiment with any machine learning algorithm or library. Therefore, we developed Sparkling Water that embeds the H2O machine learning library of advanced algorithms into the Spark ecosystem and exposes them via the pipeline API.

In this talk we will explain the architecture of Sparkling Water with a focus on integration into the Spark pipelines and MOJOs. We’ll demonstrate the creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python.

"WATER YOU TALKING ABOUT"

Dennis Bohle & Ivana Rebic | Principal Data Scientist & Data Scientist Product at Booking.com
https://www.linkedin.com/in/dennis-bohle-0a0774a5/
https://www.linkedin.com/in/ivana-rebic-7794a059/

Booking.com has been successfully working with H2O.ai for several years now. We would like to share some great insights that we’ve learned from
this collaboration. Some of the topics we will cover are:

  • How to not overfit on test sets
  • Why feature importance for tree models is wrong and how to fix it
  • Why scalable feature transformation production pipelines are important and how to best build them
  • How to efficiently implement target encoding
  • At the end of the talk we are going to share a few open research questions at booking.com

Related topics

You may also like