Unfortunately, our plans for a 250-person venue fell through, as a result we're going to have to go with a smaller venue (the Cleveland Park Public Library), and this means approximately 150 people will be bumped to the waiting list. If you don't think you can make it, please change your RSVP so someone else can!
If you're unable to make it due to space constraints, please consider attending remotely as a Google On Air participant.
Our next meetup has been scheduled. We hope you can join us! We're still looking for another presenter, so if you have a good topic and are free on this date, send us an e-mail and we'll add you to the agenda. Also, we are still waiting to confirm the meeting location, but we anticipate it will be in the Chinatown area of DC.
Cloudera Impala: Real-Time Queries for Apache Hadoop
Nong Li, Cloudera, Inc.
Google's 2010 Dremel paper described a critical technology that runs much of Google's big data business. With the establishment of the Cloudera Impala project, the Hadoop community now has a fully functional, open-sourced codebase that delivers on Google's Dremel vision, and then some.
This talk will provide technical/architectural details about how Impala allows users to query data, whether stored in HDFS or Apache HBase, in real time using familiar SQL operators, and compare/contrast its use cases with Apache Hive, commercial MapReduce alternatives, and traditional data warehouse infrastructure.
What big data topics do you want to chat about? Come prepared with topics and expertise - we'll be randomly selecting 5 topics and 5 experts from the audience, then we'll take round-robin questions from the audience on those topics and see how our experts stand up!
6:00-6:30 - Networking and snacks
6:30-6:45 - Announcements
6:45-7:15 - First presentation
7:15-8:00 - Unpanel
8:00-whenever - Drinks nearby