Bigtable: A Distributed Storage System for Structured Data

Details
Catalin Patulea will present Bigtable: A Distributed Storage System for Structured Data by F. Chang et al.
Bigtable was developed by Google from 2004 to 2006 for storing large amounts of semi-structured data in a wide range of applications, including web indexing, Google Earth, Google Finance, Google Analytics and others. Its design lies somewhere between traditional relational databases (RDBMS) and pure key-value stores, which later inspired a family of storage systems such as Apache Cassandra and Amazon DynamoDB. Modern storage systems such as Google Spanner and CockroachDB also contain design elements similar to Bigtable. Therefore Bigtable is of both historical and current interest.
In this talk, we will start with the basic data structures used by
Bigtable: SSTables and log-structured merge (LSM) trees. We will show a
highly simplified LSM-tree implemented in Python and demonstrate its
functions. This is the basic unit of scaling in Bigtable.
Then, we will roughly follow the 2006 paper: 1) data model and client API,
2) the underlying infrastructure on which Bigtable is built, 3) how the
database distributes work across many machines and achieves scaling. We will briefly cover more advanced topics such as tuning Bigtable for
specific use cases or to improve resource efficiency.
PWL will be held at the hackerspace Foulab for this event. See this page for how to get there from the building's entrance.

Bigtable: A Distributed Storage System for Structured Data