Skip to content

Details

You asked for and we are delivering the second in our “Hello:“ series of introductory “Big Data” topics. Our second meetup covers Big Data Analytics with Apache Impala.

The Apache Impala project is pioneering the next generation of big data Analytics and Data Warehousing. Impala provides fast SQL queries with the capacity, scalability, and flexibility of hundreds of computers. Impala plays well with the other animals in the big data zoo and works with multiple scalable storage options.

Why use Impala instead of a MySQL, Postgres, etc? How can Impala run parts of the same query across multiple computers at the same time? Why would you want to store data in local disk and a cloud based object store like AWS S3 or Azure ADLS?

In this session we’ll address these questions and more. First, we’ll take a look at the Impala architecture and how it scales. Next we’ll load some data and run some queries. As time permits we’ll use Apache JMeter to create a multi-user workload and test the autoscale capabilities of a cloud-based deployment.

Agenda (all times in EST):
5:00 pm: Informal Introductions and Announcements
5:10 pm: Main presentation: Hello, Impala: Using Next Generation Analytics and Data Warehousing
5:40 pm: A Demonstration of Impala
6:05 pm: Q&A
6:20 pm: Raffle of door prizes (must be present and participating to win)
6:28 pm Preview of upcoming Meetups, concluding remarks

Join Cloudera Solutions Engineers Carolyn Duby and Marty Lurie to get acquainted with Apache Impala. We are looking forward to seeing you there!

You may also like