Introduction to Apache SparkR by Databricks


Details
Topic: Introduction to Apache SparkR From Databricks
Abstract: R has evolved to become an ideal environment for exploratory data analysis. The language is highly flexible - there is a R package for almost any algorithm and the environment comes with integrated help and visualization. SparkR brings distributed computing and the ability to handle very large data to this list. SparkR is a R package distributed within Apache Spark. It exposes Spark DataFrames, which were inspired by R data.frames, to R. With Spark DataFrames, and Spark’s in-memory computing engine, R users can interactively analyze and explore terabyte size data sets. In this meetup, Hossein will introduce SparkR and how it integrates the two worlds of Spark and R. He will demonstrate one of the most important use cases of SparkR: exploratory analysis of very large data. Specifically, he will show how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis project with big data.
Speaker bio:
Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that, he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing
Agenda
6 pm -- office door open
6 - 6:30 pm light dinner + networking
6:30 pm -- 6:35 pm introduction
6:35 pm -- 7:40 pm main talk + QA
7:40 pm -- 8 pm networking
8 pm -- 8 :30 pm closing
8:30 pm -- office closed

Introduction to Apache SparkR by Databricks