- Gradient Boosting Regression: From Kaggle to business applications
Y-DATA Meetup #6 Gradient Boosting Regression: From Kaggle to business applications Hosted by JoyTunes Talks are in English ------- Intro: Most data science meetups tend to focus on neural networks and their latest advances. Y-DATA meetup series is not an exception to this rule. However, at least once a year we dive into gradient boosting. This family of algorithms is still very popular, handy and efficient when dealing with classification and regression tasks. This is the case in both ML competitions and real business applications. In the upcoming meetup we will talk specifically about gradient boosting regression. In the first talk, we'll discuss custom loss functions and target transformation on an example from Kaggle competition including implementation comparison among most popular GBT libraries (CatBoost, LightGBM, XGBoost). In the second talk, we'll present some creative ways to use it in real-world business tasks. More info about Y-DATA is here: bit.ly/ydata-website Previous meetups vidoes are here: bit.ly/youtube-ydata ----------- Agenda: 18:00 - 18:30 Registration, Mingling, Snacks & Beer 18:30 - 19:15 Talk 1: Gradient boosting regression and MAE - Dmitry Dryomov, DS consultant and Kaggle Master 19:15 - 19:30 Break 19:30 - 20:15 Predicting user acquisition campaigns results with gradient boosting regression - Asaf Adi, Data Scientist at JoyTunes ----------- Talk Details: Talk #1: Title: Gradient boosting regression and MAE (mean absolute error) Abstract: Many models need MAE to be optimized. Kaggle competitions with MAE metrics are both an important source of ideas on a topic and a way to benchmark these ideas. In this talk, we will go through median averaging, custom objectives and target transformation. We will compare their performance in the past Kaggle competition and review implementations available in CatBoost, LightGBM, and XGBoost Bio: Dmitry Dryomov is a DS consultant and Kaggle Master (🥇x 3) with highest competition rank 98. In the past Dmitry worked for Yandex and Pontis (now a part of Amdocs). Former organizer of ML training meetup [masked]) and CDS TA meetup (in 2017) on Kaggle competitions. He holds first and second degrees in Applied Mathematics. He is also a graduate of Yandex School of Data Analysis (class of '13) Talk #2: Title: Predicting user acquisition campaigns results with gradient boosting regression Abstract: Having an early prediction of the effectiveness of User Acquisition (UA) campaigns can have a dramatic effect on optimizing acquisition budgets, cutting response times and reducing manual analysis time of UA managers. This effect is even more important when the UA team works under aggressive growth targets and performance constraints. In this talk we will discuss the challenges we faced when implementing a Gradient Boosting Regression algorithm to predict campaign performance 90 days ahead. Bio: Asaf Adi is a product Data Scientist at JoyTunes, responsible for all user acquisition data aspects. Assaf has experience in the performance Marketing/Gaming data-sphere, past projects included user LTV prediction. Has a B.A in economics and taught himself to code, ML and DL in the past years.
- Creative AI: assistant or solo creator
For our fifth meeting we prepared special topic. We will talk about creativity in AI. While the term itself is not defined well, we will try to formalize it by looking at creative process in two completely different domains: text generation and image processing. Talks will be given in English. Thanks to Taboola for hosting us. Agenda: 18:00 - Gathering and Mingling, Snacks & Beer 18:30 - Creative AI: serendipity and statistical learning, Ivan P. Yamshchikov, Max Planck Institute 19:30 - Unbundling creative workflows with AI, Ofir Bibi, Lightricks Talk details: Title: Creative AI: serendipity and statistical learning Abstract: We discuss several applications of deep neural networks to natural language generation. Creative AI is an evasive term but we try to instrumentalize it and to illustrate how AI is used for a variety of 'creative' tasks. We address examples of generative poetry and music. These application shed light on serendipitous nature of creative process and can be used as one of the starting points to formally address 'creativity' in mathematical terms. Bio: Dr. Ivan P. Yamshchikov, researcher for artificial intelligence and cognition in Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany. He popularizes artificial intelligence and its versatile industrial applications. He is also a co-founder of Creaited Labs (creaited.com) — a project aimed to develop creative potential of artificial intelligence. In Creaited Labs together with his co-author, Alexey Tikhonov he has created Neurona a music albums that contained ai-generated Kurt Cobain-stylized poetry. Title: Unbundling creative workflows with AI Abstract: Creativity can be a complex thing with many hurdles in every workflow. AI, while in itself still is not very creative, can give a big boost to the creative process by augmenting it and accelerating the tedious tasks of creation. In this talk I will show how Lightricks was able to unbundle complex creative workflows by turning to AI. I will later focus on the task of facial 3D modelling and relighting. Starting from training a state of the art neural network to estimate the shape and appearance of a face in any image and going through the pipeline of creating a photorealistic relighting effect. Bio: Ofir Bibi is the director of research at Lightricks. His research is in the fields of machine learning, statistical signal processing, computational photography and computer graphics. Previously he has taken leading research positions, building systems for estimation and prediction of various signals from weather, electricity consumption and financial markets to photos, videos and auditory signals. Currently his research is focused on ways to produce consistent results from neural networks in a resource constraint environment.
- Under the hood: Recommendation engines in Music and Gaming-sponsored by Playtika
Playtika Herzliya offices
Y-DATA#4 meetup is focused again on recommender systems. After a great last meetup feedback and high interest in the field, this time we will talk about recommendations in music (Yandex.Music) and gaming (Playtika) industries. Talks will be given in English. Big Thanks to Playtika for hosting this event. Agenda for the meetup: 18:00 - 18:30 Gathering and Mingling, Snacks & Beer 18:30 - 19:15 Feature Engineering for cold-start music recommendations (Daniil Burlakov, Yandex) 19:30 - 20:15 Matrix completion algorithms for recommendation systems (Gil Shabat, Playtika) ABSTRACTS ******************************** "Feature Engineering for cold-start music recommendations" Yandex Music is a top music streaming service in Russia. We build millions of personalized playlists, selecting the best tracks for our users on a daily basis with over 50 million songs in our catalog. The quality and diversity of the recommendations are key to our user’s satisfaction. When entering a new country for the first time, achieving these qualities becomes a far greater challenge: working with new or rare content is a huge problem for the vast majority of recommendation systems which relies on historical user behavior. In this talk, we will present a set of domain-specific techniques that help overcome these hurdles and enable us to launch our service successfully in new countries. Speaker Info: Daniil Burlakov (PhD), Yandex Team lead of Recommender systems for Yandex Media Services (Music, Movies streaming and more). Dani graduated from Moscow state University (MSU). He holds PhD in qualitative theory of differential equations. Before Yandex Dani worked in the Laboratory of Mathematical Modeling for Dynamic Systems Simulation (MSU) where he developed training simulators based on virtual reality. "Matrix completion algorithms for recommendation systems" Recommendation systems play a central role in finding content or products users care about. One useful subset of algorithms used in recommendation systems, is Collaborative Filtering (CF). Unlike content based methods, when using CF, the user or the item itself does not play a role in recommendation but rather how and which users rated a particular item. In this talk, we'll discuss the Alternating-Least-Square algorithm, which is a classical method for low rank matrix completion. Along its advantages (mostly from the computational aspect), it suffers from being an NP-Hard problem, due to its non-convexity and the discrete nature of the rank. As an alternative, we'll discuss other algorithms, which are nuclear norm-based, that converge to global optimum in polynomial time with a solution that coincides with rank minimization under certain conditions. Speaker Info: Gil Shabat (Phd), Playtika Gil Shabat is a data science team leader in Playtika’s AI research group. Gil holds B.Sc, M.Sc and Ph.D degrees in electrical engineering, all from Tel Aviv university. Prior to joining Playtika, he was working in ThetaRay as a director of algorithm research and in a variety of other R&D positions. His research interests include scientific computing, fast randomized algorithms and machine learning.
- Large-Scale Recommender Systems
The third Y-DATA meetup is fully dedicated to real-world recommender systems at large scale. First talk according to our format is given by the expert from Yandex and for the second guest talk we are happy to host amazing Inbar Naor from Taboola. Talks will be given in English. Big Thanks to SimilarWeb for hosting us. Visit Y-DATA webpage to find out about our data science program at TAU campus: ydata.co.il Agenda for the meetup: 18:00 - 18:30 Gathering and Mingling, Snacks & Beer 18:30 - 18:45 Opening words from Y-DATA and our host SimilarWeb 18:45 - 19:30 Large-scale recommender system for algorithm-driven content feed (Andrey Zimovnov, Yandex) 19:30 - 20:15 Lessons from building deep learning recommendation systems (Inbar Naor, Taboola) ABSTRACTS ******************************** "Large-scale recommender system for algorithm-driven content feed" Yandex.Zen is a personal recommendations service created by Yandex that uses machine learning technology to create a feed of content that automatically adapts to reflect the user's active interests. The selection of content is done through the use of advanced machine learning techniques, employing both classical algorithms and deep learning to drive our recommendations engine. Over the course of the talk, Andrey will share his experience working with large-scale recommender system, starting with tips and tricks for user-item matrix factorization, followed by a dive into the topic of neural content representations, at last culminating with insights into implementing a fast nearest-neighbors search in order to find the best items to present to our users. Speaker info: Andrey Zimovnov, Yandex Andrei Zimovnov graduated from Moscow State University in 2013 with a computer science degree. Andrei is a senior data scientist at Yandex, where he has been working on various machine learning projects involving computer vision, natural language processing and recommender systems, currently working as part of Yandex.Zen team. Andrei is also a senior lecturer at Higher School of Economics – one of Russia's top universities, where he reads courses on machine learning. ******************************** "Lessons from building deep learning recommendation systems" Deep Learning models have been gaining increasing attention in the recommendation systems community, replacing some of the traditional methods. The sparse nature of the problems and the different input types offer unique challenges for feature engineering and architecture planning, in order to balance between memorization and generalization. During the past 2 years the algorithms team in Taboola moved all of their algorithms to DL. In this talk Inbar will share the lessons the team learned doing so. She'll talk about building NN with multiple input types (click history, text and pictures); feature engineering in DL; capturing interactions between features; and the way modelling decisions are related to system engineering and research culture. Speaker info: Inbar Naor, Taboola Inbar is a Data Scientist at Taboola, where she applies deep learning techniques for content recommendations. In the past, she worked with different types of data, including DNA sequences, neurological recordings, click streams, texts and images. She has an M.Sc. in Computer Science with a focus on machine learning research, and a B.Sc. in Computer Science and Cognitive Science. In her spare time she is the host of Unsupervised – a podcast about data science in Israel; a co-founder and manager of DataHack, a Data Science and Machine Learning Hackathon and the DataTalk meetup.
- Computer Vision - Smarter, Faster, Better
The second Y-Data meetup is fully dedicated to Computer Vision. First talk according to our format is given by the expert from Yandex and for the second guest talk we are happy to host a principal scientist from Amazon Lab126. Talks will be given in English and require some CV background. Agenda: 18:00 - 18:30 Registration, Mingling, Snacks & Beer. 18.30 - 19.30 "DeepHD: Video Super-Resolution in Real-Time with Generative Adversarial Networks" Sergey Ovcharenko, Computer Vision Technologies, Yandex In this talk Sergey will speak about boosting the resolution and improving the quality of images and videos. Super-resolution is known to be a challenging and elusive problem. Sergey will describe current approaches to super-resolution and the situations in which they fail. He will present the difficulties inherent in applying state-of-the-art solutions to real-word image and video data and how they mitigate them. The discussion will also include the difference in human perception of video and static images, the shortcomings of the widely used metrics for super-resolution performance, and describe the approach adopted by the team. In the final part of the talk Sergey will share his experience in applying neural networks to video streaming under tight performance constraints. 19:40-20:30 "Compressing without Forgetting - Specialized Detectors in Restricted Domains" Michael Chertok, Amazon Lab126 In this talk Michael will describe how object detectors can be adapted to a constrained environment while improving their performance and compressing the model. Object detectors are typically trained to be general and operate well under a wide range of settings. In practice, detectors are often deployed on restricted devices with limited resources where this generalization may be unnecessary and costly in computation and storage. We propose a self-supervised method for domain adaptation of object detectors to such restricted settings. The approach is designed with the following goals: (a) improved accuracy in the restricted domain; (b) preventing overfitting to the new domain and forgetting of generalized detection capabilities; and (c) aggressive model compression with runtime acceleration. Michael’s team proposes a simple method for balancing the compression and speed of a detection model vs. loss of its generalization capabilities.
- Yet another Gradient Boosting
Yet another Gradient Boosting ------- Intro. One of the most important parts in the modelling stage of any data science project is choosing the right algorithm. Nowadays, many of applied ML problems are solved by using one of two algorithm families: deep learning and gradient boosting. One method is not better than the other, rather it depends on which one better suits the task. In our upcoming meetup, we will focus on gradient boosting. First, we will present a new gradient boosting open source library called CatBoost. Catboost was released by Yandex last year and has been repeatedly reported to outperform similar libraries like xgboost and LightGBM. We will also demonstrate a case of gradient boosting and highlight how CatBoost is superior to other tools regarding the prediction task of online campaign results. This meetup is also one of the first opportunities to meet some of Y-DATA members! ----------- Agenda: 18:00 - 18:30 Registration, Mingling, Snacks & Beer. 18:30 - 19:15 "Crunching your data with CatBoost - the new Gradient Boosting library from Yandex" Anna Veronika Dorogush, head of Machine Learning Systems group at Yandex For a number of years, gradient boosting has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others. CatBoost (http://catboost.yandex) is an open-source gradient boosting library. During this talk we will present its principles and advantages over other publicly available gradient boosting libs. 19:15 - 19:30 Break. 19.30 - 20.15 "Boosting Media Buyers" Dr. Hanan Shteingart, Data Science team leader at Playtika Performance based media buying is the process of decision which online campaign is the one to promise best return on revenue. It is a hard task: many variables, many kpi, many products and there is a long delay between return and cost. In this talk we'll discuss a ML based system in production which ranks campaigns as an assistant tool for media buyers. Comparing our performance to humans we show that our predictions in terms of Spearman correlation are superior or equal to human experts.