Querying multiple distributed storage systems with Apache Hive robustly


Details
Abstract:
Apache Hive facilitates querying and managing large datasets residing in distributed storage. Being used by a very wide community, Hive has been extended to support multiple distributed storage systems. It is now a common practice to have data in different storage systems within an organization. This talk will cover two important aspects of Apache Hive. The talk will go over how Hive makes it possible for organizations to run complex analytical queries across various storage systems or big data components. We recently added HiveKa, to support hive queries on Kafka, and will use it as an example. At Cloudera, we focus not only on providing solutions to help organizations answer bigger questions, but we also make sure that these solutions are robust. The second aspect of this talk is to present how we use advanced methods/ technologies, like, Random Query Generators, Dockers, Benchmarks, etc to make sure that Hive is ready to find right answers from that huge Volume, high Velocity and various Varieties of today’s data.
Presented by: Ashish Singh of Cloudera

Querying multiple distributed storage systems with Apache Hive robustly