Running complex data queries in a distributed system


With the always-growing amount of data, it is getting increasingly hard to store and get it back efficiently. While the first versions of distributed databases have put all the burden of sharding on the application code, there are now some smarter solutions that handle most of the data distribution and resilience tasks inside the database.

This poses some interesting questions, e.g.

how are other than by-primary-key queries actually organized and executed in a distributed system, so that they can run most efficiently?
how do the contemporary distributed databases actually achieve transactional semantics for non-trivial operations that affect different shards/servers?

This talk will give an overview of these challenges and the available solutions that some open source distributed databases have picked to solve them.

Jan Steeman Senior Developer
After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality.