Introduction to Apache SparkR by Databricks

Name: Introduction to Apache SparkR by Databricks
Start: 2016-08-31T18:00:00-07:00
End: 2016-08-31T21:00:00-07:00
Location: GoPro

Hosted by Chester C.

SF AI

Details

Topic: Introduction to Apache SparkR From Databricks

Abstract: R has evolved to become an ideal environment for exploratory data analysis. The language is highly flexible - there is a R package for almost any algorithm and the environment comes with integrated help and visualization. SparkR brings distributed computing and the ability to handle very large data to this list. SparkR is a R package distributed within Apache Spark. It exposes Spark DataFrames, which were inspired by R data.frames, to R. With Spark DataFrames, and Spark’s in-memory computing engine, R users can interactively analyze and explore terabyte size data sets. In this meetup, Hossein will introduce SparkR and how it integrates the two worlds of Spark and R. He will demonstrate one of the most important use cases of SparkR: exploratory analysis of very large data. Specifically, he will show how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis project with big data.

Speaker bio:

Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that, he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing

Agenda

6 pm -- office door open
6 - 6:30 pm light dinner + networking
6:30 pm -- 6:35 pm introduction
6:35 pm -- 7:40 pm main talk + QA
7:40 pm -- 8 pm networking
8 pm -- 8 :30 pm closing
8:30 pm -- office closed

SF AI

Introduction to Apache SparkR by Databricks

SF AI

Details

Related topics

You may also like