Skip to content

Don't be dense! Embracing sparsity in tidymodels

Photo of Andrew Redd
Hosted By
Andrew R. and 2 others

Details

Our July Meetup will be a remote gathering; the Zoom link will be posted the week before the event.

Sparse data (data with a lot of 0s) appear quite often in modeling contexts. However, existing data structures such as data frames or matrices don't have a good way of flexibly handling them, and users are typically forced to represent all their data as either sparse or dense (non-sparse). As a result, many modeling workflows use a non-optimal data structure; at best this slows down computation and at worst training the model won’t even be computationally feasible.

This talk will cover how we overcame these issues in tidymodels, starting with the creation of a sparse vector format for tibbles followed by the wiring up needed to make use this new format in our packages. The best part is that most users doesn’t need to change anything in their code to benefit from these speed improvements.

Photo of Salt Lake City R Users Group group
Salt Lake City R Users Group
See more events
FREE