In this talk John Mount of Win-Vector, LLC will demonstrate the CRAN package rquery, and the new package rqdatatable.
rquery is a query generator for R based on Edgar F. Codd's relational algebra, and on production experience using SQL and dplyr at big data scale. rquery is powerful, easy to teach, has error checking, and basic query optimization. It is the fastest way to manipulate remote big data from R, for example data found on PostgreSQL or Spark. rqdatatable is a new package that implements the rquery grammar in memory using the data.table package. This allows the same rquery pipeline to be used remotely. such as on Spark, and for in-memory data. rqdatatable transforms are usually nearly as fast as native data.table code and usually much faster than the equivalent base-R or dplyr code.
These packages can save substantial development and infrastructure costs.
John Mount is a principal consultant at Win-Vector LLC. He is a frequent writer, speaker, and R package contributor. He is one of the authors of the popular book "Practical Data Science with R" (Zumel, Mount; Manning 2014). John has a Ph.D. in computer science from Carnegie Mellon University.
Win-Vector LLC is a San Francisco based data science consulting company. Win-Vector LLC supports the important R packages vtreat, wrapr, rquery, and many more. It also hosts the popular Win-Vector blog, a technical resource used by many data scientists. Win-Vector LLC supplies consulting and training especially for projects involving R, statistics, machine learning, big data, or Spark. Past clients include Deliotte, EMC, Intuit, Genentech, Microsoft, SalesForce, and Zions Bank. Win-Vector LLC can be reached by writing to John Mount at [masked] . We are especially interested in new clients who want consulting or training on using the Win-Vector packages.
6:00 Doors open - socializing & individual questions
6:25 Announcements & Introduction
7:45 Wrap up & socializing
8:00 Out the door!