Skip to content

Learn about SpliceMachine

Photo of Mark Kerzner
Hosted By
Mark K.
Learn about SpliceMachine

Details

Title: Using HBase Co-Processors to Build a Distributed, Transactional RDBMS

Abstract:

Rich Reimer, VP of Product Management at Splice Machine, will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.

Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.

In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.

HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.

The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.

Speaker: Rich Reimer, Splice Machine

Rich has over 15 years of sales management experience in high-tech companies. Before joining Splice Machine, Rich worked at Zynga as the Treasure Isle studio head, where he used petabytes of data from millions of daily users to optimize the business in real-time. Prior to Zynga, he was the COO and co-founder of a social media platform named Grouply. Before founding Grouply, Rich held executive positions at Siebel Systems, Blue Martini Software and Oracle Corporation as well as positions at General Electric and Bell Atlantic. Rich received his BSEE in Electrical Engineering from the University of Michigan, his MSEE in Electrical Engineering from Columbia University, and an MBA from Harvard Business School, where he was a Baker Scholar.

Photo of AI+Big Data group
AI+Big Data
See more events
Microsoft offices
2000 W Sam Houston Pkwy S Ste 350 · Houston, TX