Impala: Tuning and Best Practices


Details
Join us at BrightEdge for the next #BayAreaCUG meetup!
Agenda:
• 6 - 6:30 Networking, Food and Beverages will be available
• 6:30 - 7:15 Tech talk with Q&A: Impala Tuning & Best Practices with Dimitris Tsirogiannis, Software Engineer, Impala Team
The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. With Impala, the Hadoop community now has an open-sourced codebase that helps users query data stored in HDFS, Apache HBase, even Amazon S3 in
real time, using familiar SQL syntax. In contrast with other
SQL-on-Hadoop initiatives, Impala's operations are fast enough to do interactively on native Hadoop data rather than in long-running batch jobs.This talk presents a number of lessons and guidelines on how to get the best performance from Impala. It discusses physical design, cluster sizing and hardware recommendations as well as basics in query tuning. Also, it discusses best practices when Impala interacts with other components such as Hive and Sentry.
• 7:15 - 8:00 Community Proposed Breakouts: Bring your most pressing challenges and best practices to share and discuss

Impala: Tuning and Best Practices