In our previous well received intro event for beginners, we had learnt about Map-Reduce. Building further on that foundation, lets turn our attention to Hive and HCatalog.
Hive is a higher-level abstraction on top of MapReduce that allow those without Java programming knowledge to manage and manipulate data in a Hadoop cluster.
Hive's data warehouse system for Hadoop facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
HCatalog is a set of interfaces that open up access to Hive's metastore for tools inside and outside of the Hadoop grid.
Providing a shared schema and data type mechanism for Hadoop tools Providing a table abstraction so that users need not be concerned with where or how their data is stored. Providing interoperability across data processing tools such as Pig, Map Reduce, and Hive. Meet and Greet: 6:30pm
Talks kick off at 7pm
Mark Grover is a contributor to Apache's Hive and BigTop projects and also an active respondent on the IRC channel. Mark has authored the Hive on Amazon Web Services section in O'Reilly's "Programming Hive" book. He is a Software Engineer at Cloudera.