Graph Queries on Data Lakes


Details
Join us on February 15th 5:30pm to hear from Weimo Liu, CEO at PuppyGraph.
Abstract:
When doing graph analysis, what some users really want is a graph data
lake instead of a graph database. These users don’t want to load data
twice and store the data in two places. Especially when the graph
database doesn’t support the computation and storage separation. They
cannot turn off the expensive machine after the query is finished,
otherwise the data will be lost. An independent graph query engine can
help these users. The users’ data are in their data lakes supporting SQL
query engine Trino already, such as Apache Iceberg. Now if the users
want to run a graph query, they can start a graph query engine and
access the data lakes. After they get the results, the graph query engine
can be turned off and the data in the data lakes are still there. The data
are only in one place, we don’t need to find which is the groundtruth
data, which needs to be synced.

Graph Queries on Data Lakes