Splice Machine: Architecture of an Open Source RDBMS powered by HBase and Spark


Details
Happy new year to everyone!
We are glad to announce 2017's first meetup! We will welcome Daniel Gómez Ferro (https://www.linkedin.com/in/dgomezferro), Software Architect at Splice Machine (http://www.splicemachine.com). You may already had a beer with him in the past meetups!
Splice Machine is a San Francisco based company that just open sourced its RDBMS product (http://www.splicemachine.com/were_going_open_source/), which is based on Spark and HBase. You cannot miss it!
Title:
Architecture of an Open Source RDBMS powered by HBase and Spark
Abstract:
Splice Machine is a java-based open-source database that combines the benefits of modern lambda architectures with the full expressiveness of ANSI-SQL. Like lambda architectures, it employs separate compute engines for different workloads - some call this an HTAP database (Hybrid Transactional and Analytical Platform). This talk describes the evolution of the architecture and implementation of Splice Machine. The system is powered by a sharded key-value store for fast short reads and writes, and short range scans (Apache HBase) and an in-memory, cluster data flow engine for analytics (Apache Spark). It differs from most other clustered SQL systems such as Impala, SparkSQL, and Hive because it combines analytical processing with a distributed Multi-Value Concurrency Method that provides fine-grained concurrency which is required to power real-time applications. This talk will contextualize the need for Apache Spark in Splice Machine and describe the main challenges integrating it into an existing ACID, distributed database. We will highlight the novel contributions to the Spark/HBase ecosystem, such as hybrid scanners, custom InputFormat, out-of-JVM compactions and more. We will end with some roadmap items under development involving new row-based and column-based storage encodings.
Bio:
Daniel Gómez Ferro is a distributed systems engineer. He started his career as a Research Engineer at Yahoo! Labs, where he worked on several open source projects (HBase, ZooKeeper, S4...) and participated in research about distributed transactions, fault tolerance and dependability. Currently he is a Software Architect at Splice Machine, creating the first SQL-compliant database designed for Big Data applications.

Splice Machine: Architecture of an Open Source RDBMS powered by HBase and Spark