Past Meetup

St. Patty's Day meet-up on an Introduction to Apache Kudu

Hosted by Big Data Boston

Public group

This Meetup is past

93 people went

Location image of event venue

Details

Anyone interested in a Boston St. Patty's Day meet-up on an Introduction to Apache Kudu, Storage for Fast Analytics on Fast Data. If so, please RSVP here.

Oh and Todd Lipcon will be presenting! Special thanks to EnerNOC for providing the meet-up space. Food & BEER will be provide, it's St. Patty's Day in Boston after all!

Agenda:

• 530-6pm Food, drinks and Networking

• 6-7pm Apache Kudu presentation

• 7-730pm Networking

Speaker bio:

Todd Lipcon is an engineer at Cloudera, where he primarily contributes to open source distributed systems in the Apache Hadoop ecosystem. He is a committer and a PMC member on the Apache Hadoop, HBase, and Thrift projects. Prior to Cloudera, Todd worked on web infrastructure at several startups and researched novel machine-learning methods for collaborative filtering. Todd received his bachelor’s degree with honors from Brown University.

Introduction to Apache Kudu

Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.

Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.

This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also describe and demo Apache Kudu (http://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/), the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.