Skip to content

SparkR:Enabling Interactive R programs at Scale & GraphX:Unifying Graphs&Tables

Photo of
Hosted By
Scott W. and Andy K.

Details

Live Stream Link: https://www.youtube.com/watch?v=MY0NkZY_tJw

This month we will be at Skydeck in Berkeley. We will be having a presentation from Shivaram Venkataraman on SparkR AND a talk from Dan Crankshaw on GraphX.

Please only RSVP if you plan on attending in-person. We will be live-streaming the event and posting a video to YouTube shortly after.

Title: SparkR: Enabling Interactive R programs at Scale
Shivaram Venkataraman, UC Berkeley

Slides: http://files.meetup.com/3138542/SparkR-meetup.pdf

R is a widely used statistical programming language but its
interactive use is typically limited to a single machine. We have
recently released a developer preview of SparkR, an open source R
package that provides a light-weight frontend to Spark and enables
running R programs at scale. This talk will introduce SparkR, discuss
some of its features and highlight the power of combining R's
interactive console and extension packages with Spark's distributed
run-time.

GraphX: Unifying Graphs and Tables

Dan Crankshaw, UC Berkely

Slides: http://files.meetup.com/3138542/graphx%40spark_meetup03_2014.pdf

Increasingly, data-science applications require the creation, manipulation, and analysis of large graphs ranging from social networks to language models. While existing graph systems (e.g., GraphBuilder, Titan, Pregel, and GraphLab) address specific stages (e.g., graph construction, querying, or computation), they do not address the entire analytics process forcing users to deal with multiple systems, complex and brittle file interfaces, and inefficient data-movement and duplication.

GraphX unifies graphs and tables, enabling users to express entire graph analytics pipelines within a single system. The GraphX interactive API makes it easy to build, query, and compute on large distributed graphs. Using the GraphX API we implement a modified version of the Pregel API (in less than 50 lines of code) which adopts a more edge-centric view of computation to overcome many of the challenges of power-law graphs. By casting recent advances in graph systems as distributed join optimizations, GraphX is able to achieve performance comparable to specialized systems while exposing a more flexible API. By building on top of recent advances in data-parallel systems, GraphX is able to achieve fault-tolerance while retaining in-memory performance and without the need for explicit checkpoint recovery.

2150 Shattuck Ave, Penthouse Floor · Berkeley, CA
0 spots left