Skip to content

Details

What You'll Hear About

R is a reliable and versatile tool for data munging. Unfortunately, base R data load and transform processes are slow for larger data sets.

To alleviate this problem, the contributors to the data.table package in R have rewritten many data flow tools in C, with dramatic speed gains. Moreover, data.table can often do with one line what base-R users would require a page of code for. This has come at the cost of a rather demanding coding syntax.

During this talk, I will try to partially demystify data.table by going over a limited basic set of data.table operations, benchmarking against base-R as I go. For even larger data.sets, using R for ETL becomes unwieldly, so, time allowing, I will also attempt to demonstrate a few basic uses of Pentaho's PDI toolset.

About Our Speaker

Serban Tanasa is a managing director of Sunstone Science, a new innovative Business Intelligence and Analytics startup in the DC Metro Area.

A migrant from the world of academia and international development, he is also serving as the director of analytics and research at CSBS, where he is currently building a BI solution from the ground up.

Serban's hard at work bridging the gap between traditional Business Intelligence and the new wave of analytics-heavy methods surrounding Big Data.

Sponsors

Sponsor logo
Booz Allen
DC2 Org Sponsor
Sponsor logo
GWU
The skills you need to develop and apply modern data solutions.
Sponsor logo
Anant Corporation
Program Sponsor
Sponsor logo
ByteCubed
Tech Innovators located in Crystal City
Sponsor logo
DC Tech Live
Live Stream Sponsor

Members are also interested in