Skip to content

Apache Hive on Apache Spark (Hive on Spark) + Hands on tutorial

C
Hosted By
Carlo P.
Apache Hive on Apache Spark (Hive on Spark) + Hands on tutorial

Details

Two of the most vibrant communities in the Apache Hadoop ecosystem are now working together to bring users a Hive-on-Spark option that combines the best elements of both.

Apache Hive is a popular SQL interface for batch processing and ETL using Apache Hadoop. Until recently, MapReduce was the only execution engine in the Hadoop ecosystem, and Hive queries could only run on MapReduce. But today, alternative execution engines to MapReduce are available — such as Apache Spark (http://spark.apache.org/) and Apache Tez (incubating) (http://incubator.apache.org/projects/tez.html).

Although Spark is relatively new to the Hadoop ecosystem, its adoption has been meteoric (https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark). An open-source data analytics cluster computing framework, Spark is built outside of Hadoop’s two-stage MapReduce paradigm but runs on top of HDFS. Because of its successful approach, Spark has quickly gained momentum and become established as an attractive choice for the future of data processing in Hadoop.

In this meetup, you’ll get an overview of the motivations and technical details behind some very exciting news for Spark and Hive users: the fact that the Hive and Spark communities are joining forces to collaboratively introduce Spark as a new execution engine option for Hive, alongside MapReduce and Tez.

We will also run an hands on step by step demonstration session on how to deploy and run Hive on Spark on AWS.

Photo of Sydney Apache Spark User Group group
Sydney Apache Spark User Group
See more events
York Conference and Function Centre
Level 2 99 York St, Sydney NSW 2000 · Sydney