Impala: A Modern, Open-Source SQL Engine for Hadoop

The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. With Impala, the Hadoop community now has an open-sourced codebase that helps users query data stored in HDFS and Apache HBase in real time, using familiar SQL syntax. In contrast with other SQL-on-Hadoop initiatives, Impala's operations are fast enough to do interactively on native Hadoop data rather than in long-running batch jobs. Now you have the freedom to discover relationships and explore what-if scenarios on Big Data datasets. By taking advantage of Hadoop's infrastructure, Impala lets you avoid traditional data warehouse obstacles like rigid schema design and the cost of expensive ETL jobs.

This talk starts out with an overview of Impala from the user's perspective, followed by a presentation of Impala's architecture and implementation. It concludes with a summary of Impala's benefits when compared with Apache Hive, commercial MapReduce alternatives, and traditional data warehouse infrastructure.

About the speaker: Mark Grover is a Software Developer at Cloudera and is involved in Impala's packaging, Apache Hive and Apache Bigtop. He is also a section author of O'Reilly's Programming Hive book. He has a degree in Computer Engineering from University of Toronto. He has written a few guest blog posts and presented at many conferences about technologies in the hadoop ecosystem.

Join or login to comment.

  • Larry S.

    Thought I'd mention we have another two classes (Monday night and Saturday morning) of our University of Toronto Certificate in Enterprise Data Analytics (Big Data) starting in January if anyone is interested: http://learn.utoronto.ca/course...­

    1 · December 24, 2013

    • Larry S.

      Hi Paresh, I like Coursera a lot and use it myself. There are two differences with our courses: (1) the focus is on management of enterprise analytics so there is a fair amount of management-related content (privacy, policy, security, organizational behaviour) content that you won't see in a technical course (2) As you mentioned it is an actual certification from a top-flight university, and at about 1/10 of the cost that some schools are charging for a Masters in Analytics degree.

      January 8, 2014

    • Paresh Y.

      Hi Larry,
      Thanks for the additional details. Makes sense. When I thought about this further... few more good things about U of T courses.....Not everyone might be self motivated to commit to a regimen when the course is free and there is no fix class schedule . People naturally attribute more value to stuff that requires payment of hard earned $$. Many people, in fact most people including myself do so with some aspect of what they buy.

      January 9, 2014

  • Mark

    Thanks all for coming and all the kind words! I am very glad to meet and hang out with you all! The best thing you can do to learn more about Impala is to download the QuickStart VM that comes with it pre-installed (tiny.cloudera.com/quick-start) and play with it. Upload your own or other favourite dataset and see how it performs and let me know how it goes! There is good documentation at http://www.cloudera.com/content...­

    My slides are posted at https://github.com/markgrover/im...­

    Till next time!

    January 8, 2014

    • Hardik

      Thanks Mark for posting the slides, great presentation!

      January 8, 2014

    • Ashraf

      Thanks Mark, Great presentation. I am new to Hadoop and am more interested in it after attending this event.

      January 8, 2014

  • Paresh Y.

    Thanks Mark for a deep dive presentation on Impala. We have many great presentations at THUG but none usually attempts a deep dive as our audience has diverse level of experience with Hadoop. If possible we should have a second session to take a deep dive on the second part of the presentation as you had to rush due to time constraints, caused by someone asking too many questions :).

    2 · January 8, 2014

  • Tri N.

    Excellent topics, a little bit too short, hope there will be follow up session with more demos

    January 8, 2014

  • Hardik

    Please upload the presentations slides, thanks!

    January 8, 2014

  • Juzer P.

    Excellent

    January 7, 2014

  • A former member
    A former member

    Excellent.

    January 7, 2014

  • A former member
    A former member

    Great talk Mark. Impala looks to be the home run a lot of people were waiting for. It makes Hadoop tremendously more useful.

    January 7, 2014

  • Antonio S.

    Excellent presentation Mark! Well thought, well prepared, and well delivered. Haven't seen such high quality technical presentation for a while. I'm a guy with zero previous experience/knowledge of Hadoop except been to one or two other talks, and I have to say I enjoy this presentations entirely, from the first minutes to the last. Well done! Thanks!! Oh, thanks for the pizza BTW, to whoever footing the treat. That comes as nice surprise. :-)

    January 7, 2014

  • Richard W.

    It seems counter-intuitive that a SQL front-end would be faster than a custom-built M/R job. But your excellent overview of the architecture of Impala makes it clear why that is. Thanks for the presentation and thanks for the slides.

    January 7, 2014

    • Richard W.

      ... and thanks for the pizza!

      January 7, 2014

  • Dan D.

    Will not attend.... Got a nice cold...

    January 7, 2014

  • Gnani G.

    Sorry guys, not attending today due to last minute glitches.

    January 7, 2014

  • Mark

    wanted to make this one but i am out sick for 5 days and counting

    January 7, 2014

  • Brian O.

    I confirmed the CSI Annex space which can hold up to 100 people. Please note that there will be only 80 chairs available so standing room only beyond that.

    December 26, 2013

  • Mark

    Lets hope new location will get everyone in

    2 · December 24, 2013

    • Antonio S.

      +1, still waiting for update for my wait-listed status

      December 24, 2013

  • Brian O.

    Looks like we will move to the Center for Social Innovation (http://socialinnovation.ca/­) and have room for about 80 people. I will update the meetup once it's confirmed so everyone on the waiting list can attend! Thanks to Cloudera for sponsoring this!

    2 · December 23, 2013

  • Brian O.

    We're looking for a bigger room now. I'll update the event with a new location soon.

    November 13, 2013

  • Antonio S.

    How does Impala compare to Presto, the interactive SQL-on-Hadoop engine Facebook recently open sourced? It is said that Presto is on average 10 times faster than Hive for running queries across large data sets stored in Hadoop and elsewhere.
    http://prestodb.io/­

    November 12, 2013

    • Adam M.

      Presto looks really cool and we should all like more open source options. My caveat is that the benchmarks were based on Hive 0.10, before any of the Stinger initiatives and I'm not sure if they tested against any of the new Impala caching and planning enhancements. After reviewing the design and the code, I still expect Presto to soar however. I just need some time to test it on larger cluster. Maybe I'll have something by the hackathon

      November 12, 2013

    • Edwin C.

      Seems a lot of benchmarks are against 0.10. AWS EMR has been updated to Hadoop 2.2, I wonder how hard it would be to update the Amplabs BigData benchmark to run against the latest Hive (https://amplab.cs.berk...­)

      November 12, 2013

  • Antonio S.

    Wow, already fully booked! I hope we can have a bigger room to accommodate more people.

    November 12, 2013

  • Adam M.

    Enjoy folks, I'll be at Les Mis. There has been some progress with the caching and execution engine for Impala. Should be a worthwhile presentation as I think CDH 5 (Hadoop 2.2.0 base) is GA'ed then too.

    November 11, 2013

Our Sponsors

  • IBM

    Meeting facilities, expert speakers, free product, books and education.

  • Big Data University

    Free on-line courses in Hadoop and big data related technologies.

  • Cloudera

    10% off training for Toronto Hadoop User Group members.

  • Hortonworks

    Food, speakers, beverages

  • T4G

    Hosting Meeting locations and providing relevant speakers

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more
Bill

I started the group because there wasn't any other type of group like this. I've met some great folks in the group who have become close friends and have also met some amazing business owners.

Bill, started New York City Gay Craft Beer Lovers

Start your Meetup today

Act now and get 50% off.
Until February 1.

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy