Don't be dense! Embracing sparsity in tidymodels

Name: Don't be dense! Embracing sparsity in tidymodels
Start: 2025-07-22T12:00:00-06:00
End: 2025-07-22T13:00:00-06:00

Hosted By

Andrew R. and 2 others

Don't be dense! Embracing sparsity in tidymodels

Details

Our July Meetup will be a remote gathering; the Zoom link will be posted the week before the event.

Sparse data (data with a lot of 0s) appear quite often in modeling contexts. However, existing data structures such as data frames or matrices don't have a good way of flexibly handling them, and users are typically forced to represent all their data as either sparse or dense (non-sparse). As a result, many modeling workflows use a non-optimal data structure; at best this slows down computation and at worst training the model won’t even be computationally feasible.

This talk will cover how we overcame these issues in tidymodels, starting with the creation of a sparse vector format for tibbles followed by the wiring up needed to make use this new format in our packages. The best part is that most users doesn’t need to change anything in their code to benefit from these speed improvements.

Events in Data Science Data Visualization

R Project for Statistical Computing Statistical Computing Statistical Modeling