HBase As A File System


Details
Please join us at 630pm for drinks, food and socials.
Our fellow member of AHUG will be presenting and his name is Geovanie Marquez from Wellcentive. We are thankful for him participating in our meetups and hope others will follow. We love to hear from y'all!
Bio:
Geovanie Marquez has been leveraging java technologies to wield computer funky since 2010. He's been responsible for the big data architecture from ground zero at Wellcentive that currently manages population health for over 30M+ integrated (clinical + claims) patient charts and associated metadata. Architectural migrations is his specialty and scale is his mission. In his spare time.. he considers deeply what to do in his spare time.
Abstract:
HDFS is a fantastic file system that lives in distributed space and can hold incredible amounts of data, it also comes with consistency guarantees that gives us warm and fuzzies about the data that we store there. All this is great, but it suffers from one infamous design trade-off ... "Small Files Problem". If you are using HDFS, know that you will want to store large files and many small files will run into namenode resource restrictions.
At Wellcentive, we sought a file system that provided storage for the variety, velocity, and volume of our file storage use cases, so we naturally looked at distributed technologies. In the market, we find proprietary technologies, or non-Hadoop solutions. In this talk, I'll cover the design alternatives considered, and finally, the architected solution that leverages HBase for a our common storage subsystem keeping all files incoming and outgoing for our population health management solution. This solution stores files large and small, while leveraging native compression schemes and maintaining all HDFS file consistency guarantees.

Sponsors
HBase As A File System