Distributed Log Replay Description:
After a region server fails, we firstly assign a failed region to another region server with recovering state marked in ZooKeeper. Then a SplitLogWorker directly replays edits from WAL(Write-Ahead-Log)s of the failed region server to the region after it's re-opened in the new location. When a region is in recovering state, it can also accept writes but no reads(including Append and Increment), region split or merge.
The feature piggybacks on existing distributed log splitting framework and directly replay WAL edits to another region server instead of creating recovered.edits files.
The advantages over existing log splitting recovered edits implementation:
1) Eliminate the steps to write and read recovered.edits files. There could be thousands of recovered.edits files are created and written concurrently during a region server recovery. Many small random writes could degrade the overall system performance.
2) Allow writes even when a region is in recovering state. It only takes seconds for a failed over region to accept writes again.
The feature can be enabled by setting hbase.master.distributed.log.replay to true (by default is false)
Support Stripe Compaction:
Stripe compaction is a way to make compactions more manageable by having many regions.
The outcome of this would be
region splits become marvelously simple (if we could move files between regions, no references would be needed).
Main advantage over Level (for HBase) is that default store can still open the files and get correct results - there are no range overlap shenanigans.
It also needs no metadata, although we may record some for convenience.
It also would appear to not cause as much I/O.
About Ted Yu
Ted Yu has been a software developer for 15 years. He started contributing to HBase 3 years ago, then was promoted HBase committer / PMC member in June 2011.
Recently he has been involved in several aspects of HBase 0.96 development. Namely, rewriting RPC engine using protobuf serialization, introducing interface for Write-Ahead-Log so that multiple implementations for WAL can be plugged in, supporting multiple WALs per region server, developing snapshot capability for selected table.
Currently working at Hortonworks as a member of the sr. engineering staff.