This meetup will feature the first public preview of GraphX, a new graph processing framework for Spark. GraphX implements many ideas from an existing specialized graph processing system (in particular GraphLab) to make graph computation efficient. The new graph APIs builds on Spark's existing RDD abstraction, and thus allows programmers to blend graph and tabular (RDD) views of graph data. For example, GraphX users will be able to use Spark to construct the graph from data on HDFS, and run graph computation (e.g. PageRank) directly on it in the same Spark cluster.
The proposed API enables extremely concise implementation of many graph algorithms. We provide implementations of 4 standard graph algorithms, including PageRank, Connected Components, Shortest Path all in less than 10 lines of code; we also implement the ALS algorithm for collaborative filtering in 40 lines of code. This simple API, coupled with the Scala REPL, enables users can use GraphX to interactively mine graph data in the console.
The talk will be presented by Reynold Xin and Joey Gonzalez. We would like to thank Flurry for providing the space and food.