What we're about
Upcoming events (1)
R is a predominant tool for data scientists and statisticians. With thousands of open-source packages available, R users can easily do all kinds of data processing tasks (exploratory analysis, visualization, forecasting, machine learning, etc) - all within the same platform. For example, the dplyr and ggplot2 packages greatly simplified data manipulations and interactive visualization. What's difficult is doing these things on very large datasets. This is because R users usually run these tasks on a single thread and can process only data that fits in a single machine’s memory.
To enable fast data analysis on terabytes or even petabytes of data, SparkR can be used to (interactively) run Spark jobs in parallel from the R console. This talk will introduce SparkR, Spark DataFrames and their interactions with Spark SQL. We will discuss some of its features and highlight the power of combining R and Spark through a demo. SparkR was recently merged into the new Spark 1.4 release.
Date and location will be announced later.