Deep Dive into Cloudera Impala!

The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. With Impala, the Hadoop community now has an open-sourced codebase that helps users query data stored in HDFS and Apache HBase in real time, using familiar SQL syntax. In contrast with other SQL-on-Hadoop initiatives, Impala's operations are fast enough to do interactively on native Hadoop data rather than in long-running batch jobs. Now you have the freedom to discover relationships and explore what-if scenarios on Big Data datasets. By taking advantage of Hadoop's infrastructure, Impala lets you avoid traditional data warehouse obstacles like rigid schema design and the cost of expensive ETL jobs.

This talk starts out with an overview of Impala from the user's perspective, followed by a presentation of Impala's architecture and implementation. It concludes with a summary of Impala's benefits when compared with Apache Hive, commercial MapReduce alternatives, and traditional data warehouse infrastructure

== Speaker Bio ==

Marcel Kornacker is a tech lead at Cloudera for new products development and creator of the Cloudera Impala project. Following his graduation in 2000 with a PhD in databases from UC Berkeley, he held engineering positions at several database-related start-up companies. Marcel joined Google in 2003 where he worked on several ads serving and storage infrastructure projects, then became tech lead for the distributed query engine component of Google's F1 project.

========================

Doors open at 6:30, refreshments available

Presentation kicks off at 7

Join or login to comment.

  • Volney S.

    Very informative. Very efficient presentation covering quite a bit of material. It was very good that it was a "tech talk" and not a marketing talk. Impala was a very good topic to cover. Not only does the technology have tremendous potential as a tool it potentially introduces some vendor lock-in in a space where many are are trying to avoid it. The performance metrics were so superior to Hive that it has the potential to dominate in the cases where: (1) Query based data analytics are being performed; and (2) generic ODBC/JDBC is being used by BI tools. The only reason the presentation did not get 5 stars is that it still left me wondering if the impact of the often stated limitations in Impala SQL and lack of traditional optimization matter and the security aspects were left out. No discussion about Kerberos integration and what BI or statistical tool vendors will have to implement in order let HDFS managed security zones work with multiple users with differing entitlements.

    2 · May 7, 2013

    • Patrick A.

      Also, Impala does support Kerberos. Currently under development is a full database security model that also covers the metastore (GRANT, REVOKE and all that good stuff).

      May 8, 2013

    • Volney S.

      Patrick. Delighted to hear that about Keberos and Impala. Will it be the classic JDBC scenario of where a connection supplies a user id and password which is authenticated by the server side and then the resulting query runs under the specified user id? Or will it be the scenario where a ticket granting authority has been interacted with by a client component and then the Kerberos service principle is passed in the way it is today in HDFS and Hive Server 2? If it is the later, what client side BI tools have been certified to work with Impala in Kerberos mode? What I have found with Hive Server 2 for example is that although the Kerberos support is great, tools such as Talend, Qlikview and SAS do not have client side support for Kerberos on their part which creates additional hurdles. In the Talend case some bright guys at a client had to build a Kerberos capable proxy server to supply the credentials to Hive Server 2.

      May 8, 2013

  • Nitin k.

    Very knowledgeable and knowledge sharing session. Marcel did a deep dive. I feel there are still quite a few enhancements required to Impala before it can be a usable version. The fact that it is open-sourced may help. Is a DEMO of this going to be put up???

    May 7, 2013

  • Wei T.

    Great talk Marcel. It covered both the architectural highlight and implementation details. Sorry if I asked cynical questions, but my purpose was get technical details, indeed :)
    BTW: would like to have the slides, if possible.

    May 8, 2013

  • John

    Nice introduction. Would like the slides. Still looking for a detailed intricate Map Reduce - Hive QL - Impala ETL example showing ways to go about population of existing Hive fact/dimension tables in Hive from ascii files.

    1 · May 7, 2013

  • Ashwin D.

    Are the slides/presentation going to be posted ?

    4 · May 7, 2013

  • Adam S.

    Marcel gave a clear and thorough explanation of Impala, and did not have any marketing fluff to distract from the tech talk.

    May 7, 2013

  • Shridhar A.

    Was a Very informative and useful presentation...

    May 7, 2013

  • Pavel K.

    Incredibly informative and well-structured presentation. Was hoping for a demo or some other sizzle :)

    May 7, 2013

  • Venkk

    Deep Dive into Cloudera Imapla was very informative and great.

    May 6, 2013

  • Bert P.

    Speaker was extremely knowledgeable. I liked the fact that he repeated every question before his answer. That was excellent.

    May 6, 2013

  • Sean F.

    Looking forward to it.

    May 6, 2013

  • Debbie S.

    Unfortunately I will not be able to attend the event this evening. Hopefully next time!

    May 6, 2013

  • Senthil V.

    Yes, I Wish to attend.

    May 5, 2013

  • A former member
    A former member

    Looking forward to it

    May 4, 2013

  • Charles G.

    Looking forward to this.

    May 2, 2013

  • Rick S.

    I'm not gonna be able to make it. Bummed

    May 1, 2013

  • Ken K.

    Waiting...

    May 1, 2013

  • Wei T.

    in the waiting list, hope there can be a slot at the last minute, or a video stream....

    May 1, 2013

  • A former member
    A former member

    Hey, all. Enjoyed the last one; looking forward to this one!

    April 30, 2013

  • Shridhar A.

    looking forward to it.

    April 30, 2013

  • Brian P.

    Yes, looking fwd to it

    Brian

    April 30, 2013

  • franco

    am going to attend

    April 23, 2013

  • Manohar

    I am going to attend.

    April 23, 2013

  • David C.

    David Choy

    April 15, 2013

  • Pankaj D.

    Looking forward...

    April 10, 2013

  • Adam Smieszny changed the location to PulsePoint­

    April 9, 2013

  • Adam Smieszny changed the date and time to Tuesday, May 07, 2013 at 7:00 PM

    April 9, 2013

  • eli s.

    Looking forward to this discussion. Hope the comparison will be extended to additional tools such as Pivotal Hawq, Hadapt, and JethroData.

    March 22, 2013

Our Sponsors

People in this
Meetup are also in:

Sometimes the best Meetup Group is the one you start

Get started Learn more
Rafaël

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy