Introduction to Apache Drill


Details
If you think Hadoop and Map Reduce have changed the game for big data come and hear about "the next big thing". Ted Dunning is coming back (https://www.meetup.com/Boulder-Denver-Big-Data/events/31474362/) to tell us about the new Apache project Drill. Drill (http://www.mapr.com/support/community-resources/drill), along with Impala (http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/), are inspired by Google's Dreme (http://research.google.com/pubs/pub36632.html)l and provides interactive access to terabytes of data in seconds instead of minutes or hours. Come and learn about the next wave of big data technology from the champion of one of the main projects in the space.
Special Thanks to Ken Anderson, The Center for Big Data and Applied Science and the Department of Computer Science at CU Boulder for hosting the event.
For directions to the event see http://www.colorado.edu/cs/about/directions . On that page is a link to a document called Engineering Floor Plans and within that document on page 23 is a floor plan that shows where ECCR 265 is located. (It's on the 2nd floor of the Engineering center on the western side of the building.)
Agenda
· 6:00 – 6:30 – Socialize over food and drink
· 6:30 – 6:45 - Announcements, Upcoming Events
· 6:45 - 8:15 - Ted Dunning - Introduction to Apache Drill
· 8:15 - ?:?? - Continued Socializing
About the presenter
Ted Dunning - Chief Application Architect - MapR
Ted Dunning has been involved with a number of startups with the latest being MapR Technologies where he is Chief Application Architect working on advanced Hadoop-related technologies. He is also a PMC member for the Apache Zookeeper and Mahout projects. Opinionated about software and data-mining and passionate about open source, he is an active participant of Hadoop and related communities and loves helping projects get going with new technologies.
Introduction to Apache Drill
The Apache Drill project ostensibly has goals that make it look a lot like Dremel in that the mainline use case involves SQL or SQL-like queries applied to a large distributed data store, possibly organized in a columnar format.
In fact, however, Drill is a highly flexible architecture that allows it to serve many needs. Moreover, Drill has standardized internal API's which allow easy extension for experimentation with parallel query evaluation. This is achieved by defining a standard logical query data flow language with a standardized and very flexible JSON syntax. Operators can be added to this framework very easily with the only assumption being that operators have inputs that are record sequences and a single output consisting of a record sequence. A SQL to logical query translator and the operators necessary to evaluate these queries are part of the standard Drill, but alternative syntax is easily added and alternative semantics are easily substituted.
This talk will describe the overall architecture of Drill, report on the progress in building an open source development community and show how Drill can be used to do machine learning, how Drill can be embedded in a language like Scala or Groovy, and how new syntax components can be added to support a language like Pig. This will be done by a description of how new parsers and operators are added. In addition, I will provide a description of how Drill uses Optiq to do cost-based query optimization.
For more details: http://wiki.apache.org/incubator/DrillProposal?action=AttachFile&do=view&target=Drill+slides.pdf

Introduction to Apache Drill