Skip to content

DataTalks #9: Mastering Catboost and XGBoost in Production

Photo of Shay Palachy Affek
Hosted By
Shay Palachy A. and inbar n.
DataTalks #9: Mastering Catboost and XGBoost in Production

Details

Our 9th meetup is hosted by PerimeterX and will explore two topics related to gradient boosted decision trees.

Time: Januray 22nd, 18:00
Language: Hebrew (both lectures)
Location: PerimeterX offices, Arania Osvaldo 29, Hajaj Towers, southern entrance, floor 22.

• 18:00 - 18:30 - Gathering, snacks & mingling
• 18:30 - 19:20 - First talk:
Tal Peretz - Mastering The New Generation of Gradient Boosting
• 19:30 - 20:20 - Second talk:
Alex Gorodetsky - XGBoost in Production (@PerimeterX)

Mastering The New Generation of Gradient Boosting - Tal Peretz

Gradient Boosted Decision Trees are the hottest ML models for tabular data.
These models have already taken over Kaggle and are now taking over the industry.
In this talk, we are going to explore and compare XGBoost, LightGBM & the cool kid on the block - Catboost.

Bio: Tal Peretz is a Data Scientist, Software Engineer, and a Continuous Learner. You may know him as DataHack 2018 1st prize winner (with his brother). Previously, he founded and lead the Israeli Air Force Data Science team. Nowadays he is leveraging ML to fight fraud at simplex.com. Tal also writes for KDnuggets, Towards Data Science and HackerNoon. You can reach him at talperetz.com

Boosting Trees in Production (@PerimeterX) - Alex Gorodetsky

Boosted trees is one of the most useful and common techniques for predictive modeling, and it is used by many of our models.

When developing a new model, we go through many iterations of research and production and thus we had to find a way to effectively propagate changes from the research to the production pipeline. Generally, we can classify these changes into one of two types: feature-engineering changes (i.e. features pre-processing logic) and model topology changes (e.g. tree depth, number of trees, etc.). Our goal was to achieve a simple architecture that would allow us to propagate these changes without the need to write additional production side code.

This presentation will introduce the main solutions and approaches that are in use by the community today, together with their advantages and disadvantages. We will also present the actual solution implemented in our production flow together with some important best-practices learnt the hard way.

Bio: Alex Gorodetsky is leading the Data Science team at PerimeterX. His responsibilities at PerimeterX include: focusing a team of eager and talented data scientists on solving real-world problems and improving our bot detection solution; and making sure all the dependencies along the data science pipeline are met while moving research results into production. Prior to PerimeterX, Alex held various engineering positions both at Intel and Israel PMO, focusing mainly on communication protocols, software engineering and system architecture.

Photo of DataHack - Data Science, Machine Learning & Statistics group
DataHack - Data Science, Machine Learning & Statistics
See more events