Past Meetup

April Hadoop Meetup: Data Warehousing with HBase, Sqoop, and Impala

This Meetup is past

152 people went

Location image of event venue


We've got three speakers our our April meetup covering topics on HBase, Sqoop, and Impala. Cloudera is sponsoring drinks and food, and ASOS and kindly hosting us in their meetup space.

The three talks we have for the evening are:

- Apache HBase: Where We've Been and What's Upcoming (Jon Hsieh, Software Engineer @ Cloudera and HBase Committer)

Apache HBase is a distributed non-relational database that provides low-latency random read write access to massive quantities of data. This talk will be broken up into two parts. First I'll talk about how in the past few years, HBase has been deployed in production at companies like Facebook, Pinterest, Groupon, and eBay and about the vibrant community of contributors from around the world include folks at Cloudera,, Intel, HortonWorks, Yahoo!, and XiaoMi. Second I'll talk about the features in the newest release 0.96.x and in the upcoming 0.98.x release.

- Apache Sqoop: Unlocking Hadoop for Your Relational Database (Kathleen Ting, Technical Account Manager @ Cloudera and Sqoop Committer)

Unlocking data stored in an organization's RDBMS and transferring it to Apache Hadoop is a major concern in the big data industry. Apache Sqoop enables users with information stored in existing SQL tables to use new analytic tools like Apache HBase and Apache Hive. This talk will go over how to deploy and apply Sqoop in your environment as well as transferring data from MySQL, Oracle, PostgreSQL, SQL Server, Netezza, Teradata, and other relational systems. In addition, we'll show you how to keep table data and Hadoop in sync by importing data incrementally as well as how to customize transferred data by calling various database functions.

- Building a Hadoop Data Warehouse with Impala (Marcel Kornacker is a tech lead at Cloudera)

In this talk from Impala architect Marcel Kornacker, you will explore: How Impala's architecture supports query speed over Hadoop data that not only convincingly exceeds that of Hive, but also that of a proprietary analytic DBMS over its own native columnar format. The current state of, and roadmap for, Impala's analytic SQL functionality. An example configuration and benchmark suite that demonstrate how Impala offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop’s traditional strengths of flexibility and ease of scaling.