Apache Arrow: Enabling Data Engineering in R - Ian Cook
Details
The job of a data engineer is to build, manage, and optimize systems for transforming data into forms that facilitate analysis. Despite the broad adoption of R as a language for data science, it has taken a back seat to Python and other languages in the area of data engineering. But this is beginning to change. Data engineering tasks that were previously infeasible in R are becoming straightforward thanks to recent developments in the Apache Arrow project and the R package `arrow`. Arrow provides tools for working with tabular data that emphasize performance, efficiency, standardization, and interoperability with other languages and systems in the broader data ecosystem. Using the R package `arrow`, it is now possible to implement many data engineering and ETL tasks entirely in R, avoiding the overhead of switching to another language Python or using a framework like Spark.
All skill levels are welcomed.
Agenda:
6:30pm - 6:40pm Introductions
6:40pm - 7:20pm Topic Presentation
7:20pm - 7:30pm Closing Remarks
(Topic presentations sometimes run longer than 40 minutes)
This meetup will be 100% virtual! Check the "Location" section of the web page for the Zoom Meeting link.
Support graciously provided by the R Consortium (https://www.r-consortium.com) and Onebridge (https://www.onebridge.tech/)

