Skip to content

Querying multiple distributed storage systems with Apache Hive robustly

Photo of Mark Kerzner
Hosted By
Mark K.
Querying multiple distributed storage systems with Apache Hive robustly

Details

Abstract:

Apache Hive facilitates querying and managing large datasets residing in distributed storage. Being used by a very wide community, Hive has been extended to support multiple distributed storage systems. It is now a common practice to have data in different storage systems within an organization. This talk will cover two important aspects of Apache Hive. The talk will go over how Hive makes it possible for organizations to run complex analytical queries across various storage systems or big data components. We recently added HiveKa, to support hive queries on Kafka, and will use it as an example. At Cloudera, we focus not only on providing solutions to help organizations answer bigger questions, but we also make sure that these solutions are robust. The second aspect of this talk is to present how we use advanced methods/ technologies, like, Random Query Generators, Dockers, Benchmarks, etc to make sure that Hive is ready to find right answers from that huge Volume, high Velocity and various Varieties of today’s data.

Presented by: Ashish Singh of Cloudera

Photo of AI+Big Data group
AI+Big Data
See more events
Microsoft offices
2000 W Sam Houston Pkwy S Ste 350 · Houston, TX