Skip to content

December CHUG: Kudu: New Hadoop Storage for Fast Analytics on Fast Data

Photo of Bbox
Hosted By
Bbox
December CHUG: Kudu: New Hadoop Storage for Fast Analytics on Fast Data

Details

This month's discussion will be hosted by Cloudera on Kudu. The presenter will be Michael Crutcher, Director of Product Management.

Kudu is the new, native storage engine for Hadoop designed for fast analytic performance on updating data. Complementing the capabilities of HDFS and HBase, Apache-licensed Kudu simplifies the architecture for building these real-time analytic applications.

Harnessing the value of data in real-time is an increasingly common use case for Hadoop. However, advancements were necessary to fill important gaps in the storage layer. When building applications, users often had to choose: Extremely fast analytic scan rate with no ability to handle real-time modifications (HDFS with Apache Parquet) or very fast random access at the cost of scan performance (Apache HBase). For real-time analytic applications that required fast analytic performance over updating data, we started to see complex “hybrid architectures” emerge.

Speaker's bio:

Michael Crutcher is responsible for storage products at Cloudera, including HDFS, HBase, and Kudu. He was previously a product manager at Pivotal responsible for Greenplum Database and a data engineer at Amazon where he helped design and manage its data warehouse environment.

Photo of CLT AI group
CLT AI
See more events
Packard Place
222 S Church Street · Charlotte, NC