Past Meetup

Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data with Mike Percy

This Meetup is past

125 people went


We are very excited to announce that Mike Percy from Cloudera is coming to town to talk about a very cool new data store called Kudu. Kudu bridges the performance and mutability gaps between column-oriented filed based systems like Parquet on HDFS and key-value based systems like HBase, Cassandra, etc..


6:00 – 6:30 - Socialize over food and drink
6:30 – 6:45 - Announcements, Upcoming Events
6:45 – 8:30 - Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data with Mike Percy
8:30 – ??? - Continued socializing

About the presentation

Apache Kudu (incubating) is a fast new columnar data store for the Hadoop ecosystem designed to enable high-performing, flexible analytic pipelines. Being optimized for lightning-fast scans, Kudu is particularly well suited to hosting time-series data such as metrics, machine learning model-building workloads, and data warehousing applications. Despite its impressive scan speed, Kudu also supports operations typically supported by traditional data stores, including real-time insert, update, and delete operations. Kudu supports a "bring your own SQL" model, and supports being queried by multiple SQL engines, including Apache Spark SQL, Apache Impala (incubating), and Apache Drill. This talk will discuss what Kudu is, why we decided to build it, what makes it fast, and an example of how it can be used for a time series use case.

About Mike Percy

Mike Percy is a Software Engineer at Cloudera and a committer on Apache Kudu (incubating). Prior to joining Cloudera, Mike worked on big data infrastructure for machine learning at Yahoo! Mike holds a BSCS from UC Santa Cruz and an MSCS from Stanford.