addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Intro to Apache Kudu: Fast Analytics on Fast Data

Free swag, fun-drinks, junk food and lots more!

Cloudera Director of Product Management, Michael Crutcher

A Database Month event http://www.DBMonth.com/database/apache-kudu

This session will describe Kudu, the new addition to the open source Hadoop ecosystem that complements HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.

Over the past several years, the Apache Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap with traditional database technologies. With systems such as Apache Impala (incubating) and Apache Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.

Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.

This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals, and how Apache Kudu solves many of these challenges.

Cloudera Director of Product Management, Michael Crutcher

Michael Crutcher is a Director of Product Management at Cloudera. He is responsible for all product management activities related to current and future Cloudera storage products including HDFS, HBase, Accumulo, Kudu, etc.

Prior to Cloudera, Michael held positions at Greenplum, Amazon, and Zilliant. He has a MS in MIS from Texas A&M University.

Swag giveaway + food/drinks at 6:30pm 
Power-Networking at 6:35pm 
Presentation starts at 6:40pm

Did you know that Techie Youth is the ONE-AND-ONLY organization providing career-opportunities to New York's foster-kids (kids without parents) and at-risk youth? Techie Youth is a 501c3 not-for-profit charity of NYC that provides free technology-training to prepare at-risk youth for an IT-career. 

76 youth have graduated Techie Youth in Q2/Q3 of 2016; Techie Youth is a life-saving endeavor - all students are either in foster care or "aged-out", homeless, transgendered/LGBQ or in "2nd chance" juvenile-justice programs. Learn more now at https://www.TechieYouth.org


Join or login to comment.

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy