Graph Fun with Apache Flink & Neo4j


This Meetup is a joint event with the Berlin Apache Flink Meetup (, thanks Kostas for joining forces!

Vasia Kalavri shows how to process graphs on Apache Flink with the Gelly framework:

In this talk, I will give an overview of Apache Flink's Graph processing API, Gelly. Flink's iterative operators and other unique features make it a competitive alternative for large-scale graph processing. Graph analysis tasks can elegantly be expressed using common Flink operators, and different graph processing models, like vertex-centric and gather-sum-apply, can easily be mapped to Flink dataflows. Using Gelly, you can perform loading, transformation, filtering, graph creation and analysis, with a single program.

I will also share our recent work with KTH, Stockholm, on supporting single-pass graph streaming analytics on Apache Flink. I will introduce gelly-stream, a protoype that allows computing graph statistics, aggregates, sketches, as well as more complex algorithms, like connected components on streams of edges.

Martin Junghanns will introduce Gradoop a graph analytics system built on top of Apache Flink.

Gradoop is designed around the so-called Extended Property Graph Model (EPGM) supporting heterogeneous, schema-free graph data. In this model, a database consists of multiple property graphs which we call logical graphs. The EPGM provides analytical operators for both single graphs as well as collections of graphs. Operators may also return single graphs or graph collections thus enabling the definition of analytical programs. Additionally, Gradoop integrates Flink Gelly which enables the application of arbitrary graph algorithms on an EPGM database.

In this presentation, I will give an overview of Gradoop, the EPGM and its operators. Furthmermore, I will sketch the usefulness of our system by demonstrating an analytical use case involving Neo4j, Flink and Gradoop.