Building a Hadoop Warehouse with Impala
Details
Dear all,
we're looking forward to our April Meetup, which will give us the chance to learn about Impala's capability for using Hadoop as a data warehouse - directly from one of it's creators!
The talk:
*Building a Hadoop Warehouse with Impala, Marcel Kornacker, Cloudera
Impala (impala.io) raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.
The talk will explore:
• How Impala's architecture supports query speed over Hadoop data that not only convincingly exceeds that of Hive, but also that of a proprietary analytic DBMS over its own native columnar format
• The current state of, and roadmap for, Impala's analytic SQL functionality
• An example configuration and benchmark suite that demonstrate how Impala offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop’s traditional strengths of flexibility and ease of scaling.
About the speaker:
Marcel Kornacker is a tech lead at Cloudera for new products development and creator of the Cloudera Impala project. Following his graduation in 2000 with a PhD in databases from UC Berkeley, he held engineering positions at several database-related start-up companies. Marcel joined Google in 2003 where he worked on several ads serving and storage infrastructure projects, then became tech lead for the distributed query engine component of Google’s F1 project.