Querying multiple distributed storage systems with Apache Hive robustly

Name: Querying multiple distributed storage systems with Apache Hive robustly
Start: 2015-05-12T18:00:00-05:00
End: 2015-05-12T20:00:00-05:00
Location: Microsoft offices

Hosted by Mark K.

AI+Big Data

Details

Abstract:

Apache Hive facilitates querying and managing large datasets residing in distributed storage. Being used by a very wide community, Hive has been extended to support multiple distributed storage systems. It is now a common practice to have data in different storage systems within an organization. This talk will cover two important aspects of Apache Hive. The talk will go over how Hive makes it possible for organizations to run complex analytical queries across various storage systems or big data components. We recently added HiveKa, to support hive queries on Kafka, and will use it as an example. At Cloudera, we focus not only on providing solutions to help organizations answer bigger questions, but we also make sure that these solutions are robust. The second aspect of this talk is to present how we use advanced methods/ technologies, like, Random Query Generators, Dockers, Benchmarks, etc to make sure that Hive is ready to find right answers from that huge Volume, high Velocity and various Varieties of today’s data.

Presented by: Ashish Singh of Cloudera

AI+Big Data

Querying multiple distributed storage systems with Apache Hive robustly

AI+Big Data

Details

Related topics

You may also like