CatBoost: Distributed Training, Uncertainty Estimation and Other News


Details
After our last 2 meetups with core developers of XGBoost and LightGBM, respectively, it is now CatBoost's turn (with the head of the CatBoost dev team speaking)! Just as last time, we'll fit in a 1-hour slot, talk 35 minutes + Q&A 20 minutes (10:00-10:55am Pacific Time).
The zoom link will be posted in comments below at 9:55am and due to our zoom's 100-attendee limit, the first 100 people will be able to join the zoom call.
CatBoost: Distributed Training, Uncertainty Estimation and Other News
by Stanislav Kirillov
CatBoost is a popular open-source library for training gradient boosting models, with built-in categorical, text, and embedding features support.
In this talk, we will discuss major updates and recall the main features of CatBoost, including:
- CatBoost for Spark release
- Object embeddings and text features support
- Uncertainty estimation
- GPU training support
- Dataset prequantization support
- Fast inference (both CPU and GPU)
We will show a brief demo of CatBoost PySpark training and present plans for CatBoost development.
Speaker Bio:
Stanislav Kirillov is the head of CatBoost development team at Yandex. He develops machine learning tools, supporting and developing infrastructure for them. Stanislav is a big fan of distributed training and low-level software optimizations.

CatBoost: Distributed Training, Uncertainty Estimation and Other News