2023 RocksDB Mid-Year In-Person Meetup (at Rockset HQ) (Zoom available)


Details
[Update] We are also adding a zoom link for those who cannot attend in person: https://rockset.zoom.us/j/87378448621
We are excited to announce the next in-person RocksDB meetup! This event will be on June 13th at the Rockset office locations in San Mateo, California. Come meet other engineers in the RocksDB community. Food and swag will also be provided!
How Databricks manages Stateful Streaming pipelines at massive scale using RocksDB and Apache Spark Structured Streaming [30 min]
Stateful streaming pipelines are one of the most important and fastest growing streaming workloads at Databricks. In this talk, we will go through how Databricks uses RocksDB as a state store provider for managing large volume data pipelines to achieve optimal performance as well as resource usage. We will talk about our experience with various database modes, writeBatchWithIndex, issues with pausing the database as well as around changelog management. We will also discuss how we are looking to adopt new RocksDB features such as write buffer manager for memory management, compaction filters for efficient eviction, column families for multiple state instances and much more for enhanced functionality and performance.
Speakers: Anish Shrigondekar, Karthik Ramasamy
How Rockset Isolates Streaming Ingest and Queries Using RocksDB [30 min]
RocksDB is the storage engine for the Rockset realtime database. Rockset builds a ConvergedIndex on every row and every column of your data and stores it in RocksDB. In this talk, we present a real-time analytics architecture implemented in the Rockset database that effectively isolates streaming data ingestion from query serving.
Core to the Rockset architecture is the separation of compute and storage. This allows multiple RocksDB instances to query from the same shared storage. We use cloud object storage to ensure durability and use SSD as a shared hot storage tier for low-latency reads. On the compute side, we designed our query processing engine to be completely separate from all the modules that perform data ingestion. This separation of query-compute from ingest-compute is called Compute-Compute separation and needed fundamental changes in the implementation of RocksDB’s write-ahead-log.
For fresh data to be available to multiple compute units, it is essential that the in-memory state of the ingester's RocksDB memtable be replicated to other RocksDB instances. We built a RocksDB memtable replicator that propagates changes to remote instances in single-digit milliseconds. This architecture enables compute isolation so that real-time streaming ingestion doesn't interfere with queries, while still allowing most recent data to be queried.
Speakers: Nathan Bronson and Karen Li
How WhatsApp Built Message Storage for 2.5+B Users with RocksDB [30 min]
RocksDB is the storage engine for WhatsApp’s Messaging Infrastructure (WMI). WMI is microservices-oriented and mostly written in Erlang. We have built an Erlang SDK on top of RocksDB which can be embedded into our microservices to provide storage capabilities to applications with tunable consistency guarantees. This architecture enables application developers to plug business logic closer to storage, for e.g, aggregating message receipts while processing messages. Our system serves 2.5+ Billion daily users and ~100 Billion+ messages daily.
In this talk, we present an overview of WMI, several large-scale use cases, the architecture of our system and our experience with using RocksDB.
Speakers: Josep-Angel Herrero Bajo, Henry Sun
Disaggregating RocksDB at Meta : A Production Experience [30 min]
Over the past few years, an effort was undertaken to migrate data from locally attached SSDs to cloud storage in Meta's data centers, akin to current industry trends. Meta extended RocksDB, a widely used open-source storage engine designed and built for local SSDs, to leverage disaggregated storage. RocksDB’s design, such as its data and log files’ access patterns, makes an append-only distributed file system a desirable underlying storage. At Meta, we built disaggregated RocksDB on top of our internal Tectonic File System (roughly analogous to HDFS)
In this talk, Meta will detail the technical challenges presented by applications running on RocksDB and disaggregated storage and discuss the cutting-edge enhancements to address them. We believe this architecture and the inherent performance enhancements enables RocksDB to adapt to a more distributed architecture. Note that this work will also be presented as an accepted paper at the 2023 ACM SIGMOD conference.
Speakers: Anand Ananthabhotla, Dhanabal Ekambaram, Sushil Patil
Memtable and write flow - introduction and innovations [10 min]
Speedb will provide an overview of the advantages and disadvantages of existing memtable solutions and outline the current writeflow process.
Speaker: Bosmat Tuv El
COVID-19 safety measures

2023 RocksDB Mid-Year In-Person Meetup (at Rockset HQ) (Zoom available)