Skip to content

Details

File import in R could be considered a solved problem, with multiple widely used packages (data.table, readr, and others) providing fast, robust import of common formats in addition to the functions available in base R.

However I feel there is still room for improvement in existing approaches. vroom is able to index and then query multi-Gigabyte files, including those with categorical, text and temporal data, in near real-time, parsing at over 1 Gb per second.

This is a huge boon for interactive data analysis as you can jump directly into exploratory analysis without sampling or long waits for full import.

vroom leverages the Altrep framework introduced in R 3.5 along with lazy, just-in-time parsing of the data to provide this improved latency without requiring changes to existing data manipulation code.

I will thoroughly explain the techniques used in vroom to ensure good performance, describe challenges overcome in implementing it, and provide an interactive demonstration of its capabilities.

vroom is on CRAN now, install it with `install.packages("vroom")` and learn more about the package at https://vroom.r-lib.org

Related topics

Sponsors

Progressive Insurance

Progressive Insurance

Provides Pro Zoom account

RStudio

RStudio

Provides financial support, publicity.

R Consortium

R Consortium

Our meetup.com membership, financial support, publicity

Tech Community Coalition

Tech Community Coalition

Provides a bank account to manage donations

You may also like