Skip to content

Details

Our July Meetup will be a remote gathering; the Zoom link will be posted the week before the event.

Sparse data (data with a lot of 0s) appear quite often in modeling contexts. However, existing data structures such as data frames or matrices don't have a good way of flexibly handling them, and users are typically forced to represent all their data as either sparse or dense (non-sparse). As a result, many modeling workflows use a non-optimal data structure; at best this slows down computation and at worst training the model won’t even be computationally feasible.

This talk will cover how we overcame these issues in tidymodels, starting with the creation of a sparse vector format for tibbles followed by the wiring up needed to make use this new format in our packages. The best part is that most users doesn’t need to change anything in their code to benefit from these speed improvements.

Data Science
Data Visualization
R Project for Statistical Computing
Statistical Computing
Statistical Modeling

Sponsors

Sponsor logo
R Consortium
Meetup Pro account
Sponsor logo
Gravity IT Resources
Networking events
Sponsor logo
Intermounain Healthcare
Meeting space, nominal funds for food
Sponsor logo
Neumont College of Computer Science
Meeting space, support of administration
Sponsor logo
U of U PHR SDBC
Meeting space, nominal funds for food, support of administration

Members are also interested in