Apache Kudu and Watson Analytics


Details
Apache Kudu, presented by Ryan Bosshart, Cloudera
Watson Analytics, presented by Jason Bennett, IBM
Apache Kudu: New Apache Hadoop Storage for Fast Analytics on Fast Data
If you're building relational, time-series, IOT, or real-time architectures using Hadoop, you will find Apache Kudu an attractive choice. With Kudu, you'll be able to build your applications more simply and with fewer moving parts.
Hadoop has become faster and more capable, and has continued to narrow the gap compared to traditional database technologies. However, for developers looking for up-to-the-second analytics on fast-moving data, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing and analytical workloads.
This talk will describe Kudu, the new addition to the open source Hadoop ecosystem with out-of-the-box integration with Apache Spark and Apache Impala. Kudu fills the gap described above to provide a new option to achieve fast scans and fast random access from a single API.
Speaker Bio:
Ryan Bosshart is a Principal Systems Engineer at Cloudera. Ryan has spent the last 10 years building and architecting distributed systems. At Cloudera, Ryan leads the field storage specialization team where he focuses on Apache HDFS, Hbase, and Kudu. Ryan is a co-chair of the Twin Cities Spark and Hadoop User Group.

Apache Kudu and Watson Analytics