addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

SparkR:Enabling Interactive R programs at Scale & GraphX:Unifying Graphs&Tables

Live Stream Link:

This month we will be at Skydeck in Berkeley. We will be having a presentation from Shivaram Venkataraman on SparkR AND a talk from Dan Crankshaw on GraphX. 

Please only RSVP if you plan on attending in-person. We will be live-streaming the event and posting a video to YouTube shortly after. 

Title: SparkR: Enabling Interactive R programs at Scale
Shivaram Venkataraman, UC Berkeley


R is a widely used statistical programming language but its
interactive use is typically limited to a single machine. We have
recently released a developer preview of SparkR, an open source R
package that provides a light-weight frontend to Spark and enables
running R programs at scale. This talk will introduce SparkR, discuss
some of its features and highlight the power of combining R's
interactive console and extension packages with Spark's distributed

GraphX: Unifying Graphs and Tables

Dan Crankshaw, UC Berkely


Increasingly, data-science applications require the creation, manipulation, and analysis of large graphs ranging from social networks to language models.  While existing graph systems (e.g., GraphBuilder, Titan, Pregel, and GraphLab) address specific stages (e.g., graph construction, querying, or computation), they do not address the entire analytics process forcing users to deal with multiple systems, complex and brittle file interfaces, and inefficient data-movement and duplication.

GraphX unifies graphs and tables, enabling users to express entire graph analytics pipelines within a single system. The GraphX interactive API makes it easy to build, query, and compute on large distributed graphs. Using the GraphX API we implement a modified version of the Pregel API (in less than 50 lines of code) which adopts a more edge-centric view of computation to overcome many of the challenges of power-law graphs. By casting recent advances in graph systems as distributed join optimizations, GraphX is able to achieve performance comparable to specialized systems while exposing a more flexible API.  By building on top of recent advances in data-parallel systems, GraphX is able to achieve fault-tolerance while retaining in-memory performance and without the need for explicit checkpoint recovery.

Join or login to comment.

  • Burt P.

    Fantastic meetup! Dan did an awesome job on GraphX and Shivaram's preso on SparkR was perfect.
    Both projects are important and needed.

    March 26, 2014

  • Scott W.

    March 26, 2014

  • Mark

    Thanks for a great Spark event at Berkeley's SkyDeck! If you are interested in learning more about SkyDeck or Automa Systems feel free to reach out to me at [masked]

    March 25, 2014

  • Walter M.

    Really enjoyed it tonight. Thanks everybody.

    March 25, 2014

  • kripa

    thanks for streaming and slides. There seem to be bunch of background noise, fyi

    March 25, 2014

    • Andy K.

      Thanks for letting us know. We can't fix that this time (failure in our sound system) but we will address it by the next meetup. Thanks for your patience!

      March 25, 2014

  • Andy K.

    The meetup has started and the video stream is live!

    March 25, 2014

  • Hardik

    Thanks for the slides :)

    March 25, 2014

  • Scott W.

    March 25, 2014

  • Dagny T

    This offers some tough competition for proprietary vendors' offerings in Big Data Analytics: Cloudera, HortonWorks, Pivotal/EMC, IBM. Is the Spark team currently aligned with any vendor over the others as far as offering analytics capabilities; e.g. Cloudera?

    March 12, 2014

  • Mona C.

    Good to know there will be a webcast since it is quite a drive from South Bay for me

    March 18, 2014

  • NinaWang

    Nina Wang

    March 10, 2014

  • Lilun C.


    March 8, 2014

  • Suhas

    Will there be a video webcast or slides posted after the event for those who cannot make it to East Bay ?

    March 6, 2014

    • A former member
      A former member

      Did you read the event description?

      2 · March 6, 2014

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy