Deep Dive with Shark (Hive on Spark)

Spark is an open source cluster computing framework that can outperform Hadoop by 30x by storing datasets in memory across jobs. Shark is a port of Apache Hive onto Spark, which provides a similar speedup for SQL queries, allowing interactive exploration of data in existing Hive warehouses. In this meetup, we'll go into detail on the implementation of Shark, and also show how to get started with a first alpha release.

The meetup will be hosted at Palantir Technologies in Palo Alto. Food will be available at 6:30, with talks starting at 7 PM.

 

More Details on Shark

We have ported Apache Hive, the large-scale Hadoop data warehouse solution, to run queries on Spark. The resulting system, Shark (Hive on Spark), can answer Hive QL queries 30 times faster than Hive without modification to the existing data. It is backward-compatible with the Hive QL language, metastore, and user-defined functions. We will cover the architecture and implementation of Shark, including our additions to Hive QL that allow users to cache data in memory, and a new column-oriented format we have designed for storing Hive data efficiently in memory on the JVM as arrays of primitive types.

Additionally, we will discuss our ongoing work on integrating SQL processing with machine learning, which we see as a natural future direction for Shark due to Spark's inherent efficiency at iterative algorithms. In Shark, we allow users to express their machine learning algorithms as Scala-based "distributed UDFs", which then run in the same execution engine as the SQL query processor. This enables much more efficient data pipelines, and provides a unified system for data analysis using both SQL and sophisticated statistical learning functions.

 

These topics will be presented by Reynold Xin, Cliff Engle and Antonio Lupher, the Berkeley research team behind Shark.

Join or login to comment.

  • Bharath P.

    Interesting and impressive.

    April 26, 2012

  • Matei Z.

    For those interested, slides from yesterday are now online: http://shark.cs.berkeley.edu/pr...­. Also, the Shark website is up at shark.cs.berkeley.edu.

    April 24, 2012

  • Matei Z.

    Important update: The meetup location is actually 151 University Ave 4th Floor. I had put the wrong Palantir building earlier! It's still close but please come to the new location.

    April 16, 2012

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more
Allison

Meetup has allowed me to meet people I wouldn't have met naturally - they're totally different than me.

Allison, started Women's Adventure Travel

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy