Skip to content

Apache Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

Photo of Ryan Bosshart
Hosted By
Ryan B.
Apache Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data

Details

Abstract: If you're building relational, time-series, IOT, or real-time architectures using Hadoop, you will find Apache Kudu an attractive choice. With Kudu, you'll be able to build your applications more simply and with fewer moving parts.

Hadoop has become faster and more capable, and has continued to narrow the gap compared to traditional database technologies. However, for developers looking for up-to-the-second analytics on fast-moving data, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing and analytical workloads.

This talk will describe Kudu, the new addition to the open source Hadoop ecosystem with out-of-the-box integration with Apache Spark and Apache Impala. Kudu fills the gap described above to provide a new option to achieve fast scans and fast random access from a single API.

Bio: Ryan Bosshart is a Systems Engineer where he leads the field storage specialization team.

Parking: There are two options to pay for parking in the adjacent Anderson ramp. You can either enter/exit with a credit card, or you can take a ticket and use the pay kiosk on the northeast corner of the ramp to get an exit ticket.

Food: Pizza and drinks, first come first serve, starting at 6:30PM provided by the University of St. Thomas, Graduate Programs in Software.

Map: http://bit.ly/RCtaTI

Photo of Twin Cities Spark and Hadoop User Group group
Twin Cities Spark and Hadoop User Group
See more events