Introduction to GraphLake


Details
Trends in tabular databases have moved towards massively parallel processing systems coupled with file structures that use meta data files, and optimized data files. Examples of these systems include Apache Iceberg, Delta Lake, with technologies such as DreamIO and Snowflake delivering operational solutions. We venture that this approach could be applied to graph data structures, but that the kinds of partitioning and filtering used in graph need their own data structures and operational semantics. In this presentation we describe GraphLake; a novel approach to storing graph data in immutable files and using metadata to discern which files to process at query time. We describe the data file and metadata files structure and then formally define the set of semantics that collectively deliver an operational database system for both OLTP and large-scale analytic workloads. As well as a novel storage engine, GraphLake also exposes a unified graph data model for RDF* and Property Graphs.

Introduction to GraphLake