Skip to content

CatBoost: Distributed Training, Uncertainty Estimation and Other News

Photo of Szilard Pafka
Hosted By
Szilard P.
CatBoost: Distributed Training, Uncertainty Estimation and Other News

Details

After our last 2 meetups with core developers of XGBoost and LightGBM, respectively, it is now CatBoost's turn (with the head of the CatBoost dev team speaking)! Just as last time, we'll fit in a 1-hour slot, talk 35 minutes + Q&A 20 minutes (10:00-10:55am Pacific Time).

The zoom link will be posted in comments below at 9:55am and due to our zoom's 100-attendee limit, the first 100 people will be able to join the zoom call.

CatBoost: Distributed Training, Uncertainty Estimation and Other News
by Stanislav Kirillov

CatBoost is a popular open-source library for training gradient boosting models, with built-in categorical, text, and embedding features support.
In this talk, we will discuss major updates and recall the main features of CatBoost, including:

  • CatBoost for Spark release
  • Object embeddings and text features support
  • Uncertainty estimation
  • GPU training support
  • Dataset prequantization support
  • Fast inference (both CPU and GPU)

We will show a brief demo of CatBoost PySpark training and present plans for CatBoost development.

Speaker Bio:
Stanislav Kirillov is the head of CatBoost development team at Yandex. He develops machine learning tools, supporting and developing infrastructure for them. Stanislav is a big fan of distributed training and low-level software optimizations.

Photo of Real Data Science USA (formerly LA Data Science) group
Real Data Science USA (formerly LA Data Science)
See more events