Skip to content

Exploratory Analysis of Large Data with R and Spark

Photo of Denny Lee
Hosted By
Denny L.
Exploratory Analysis of Large Data with R and Spark

Details

Edit: We will be in the Arnold building (map (http://www.fredhutch.org/en/contact-us/visit-us.html)). Enter the building at the side with the arrow pointing to it. Ample parking shown on map as well as on Valley, Minor and Aloha. We will have signs and someone at the door of Arnold building to direct traffic.

We have an exciting joint Seattle userR / Seattle Spark Meetup event! Hossein Falaki - Software Engineer and Data Scientist from Databricks - will be visiting the Emerald City for this awesome session.

Abstract

R has evolved to become an ideal environment for exploratory data analysis. The language is highly flexible - there is a R package for almost any algorithm and the environment comes with integrated help and visualization. SparkR brings distributed computing and the ability to handle very large data to this list. SparkR is a R package distributed within Apache Spark. It exposes Spark DataFrames, which were inspired by R data.frames, to R. With Spark DataFrames, and Spark’s in-memory computing engine, R users can interactively analyze and explore terabyte size data sets.

In this meetup, Hossein will introduce SparkR and how it integrates the two worlds of Spark and R. He will demonstrate one of the most important use cases of SparkR: exploratory analysis of very large data. Specifically, he will show how Spark’s features and capabilities, such as caching distributed data and integrated SQL execution, complement R’s great tools such as visualization and diverse packages in a real world data analysis project with big data.

About Hossein Falaki

Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS).

Agenda

6:00pm - Doors opened

6:30pm - Starting session

7:30pm - Q&A

8:00pm

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Food + drinks provided by O'Reilly Media!

http://photos4.meetupstatic.com/photos/event/b/0/0/8/600_446505064.jpeg

Strata + Hadoop World San Jose 2016 (http://www.oreilly.com/pub/cpc/4877) is the leading event on how big data and ubiquitous, real-time computing is shaping the course of business and society. It brings together the world’s best data scientists and business leaders to share hard-won knowledge and innovations in technology and strategy. Check out the impressive program and make plans to join Strata + Hadoop World in San Jose March 28-31, 2016. Save 20% on most passes with discount code UGSEASPRK

Photo of Seattle Spark+AI Meetup group
Seattle Spark+AI Meetup
See more events
Fred Hutchinson Cancer Research
1100 Fairview Ave N · Seattle, WA