Deep Dive with Shark (Hive on Spark)

Spark is an open source cluster computing framework that can outperform Hadoop by 30x by storing datasets in memory across jobs. Shark is a port of Apache Hive onto Spark, which provides a similar speedup for SQL queries, allowing interactive exploration of data in existing Hive warehouses. In this meetup, we'll go into detail on the implementation of Shark, and also show how to get started with a first alpha release.

The meetup will be hosted at Palantir Technologies in Palo Alto. Food will be available at 6:30, with talks starting at 7 PM.

 

More Details on Shark

We have ported Apache Hive, the large-scale Hadoop data warehouse solution, to run queries on Spark. The resulting system, Shark (Hive on Spark), can answer Hive QL queries 30 times faster than Hive without modification to the existing data. It is backward-compatible with the Hive QL language, metastore, and user-defined functions. We will cover the architecture and implementation of Shark, including our additions to Hive QL that allow users to cache data in memory, and a new column-oriented format we have designed for storing Hive data efficiently in memory on the JVM as arrays of primitive types.

Additionally, we will discuss our ongoing work on integrating SQL processing with machine learning, which we see as a natural future direction for Shark due to Spark's inherent efficiency at iterative algorithms. In Shark, we allow users to express their machine learning algorithms as Scala-based "distributed UDFs", which then run in the same execution engine as the SQL query processor. This enables much more efficient data pipelines, and provides a unified system for data analysis using both SQL and sophisticated statistical learning functions.

 

These topics will be presented by Reynold Xin, Cliff Engle and Antonio Lupher, the Berkeley research team behind Shark.

Join or login to comment.

  • Bharath P.

    Interesting and impressive.

    April 26, 2012

  • Matei Z.

    For those interested, slides from yesterday are now online: http://shark.cs.berkeley.edu/pr...­. Also, the Shark website is up at shark.cs.berkeley.edu.

    April 24, 2012

  • Matei Z.

    Important update: The meetup location is actually 151 University Ave 4th Floor. I had put the wrong Palantir building earlier! It's still close but please come to the new location.

    April 16, 2012

Our Sponsors

  • Databricks

    video streaming / recording

  • O'Reilly Media

    Conference coupons, new ebooks/videos samplers; new reports, etc.

  • Cloudera

    Kindly providing food & drink!

People in this
Meetup are also in:

Create your own Meetup Group

Get started Learn more
Bill

I started the group because there wasn't any other type of group like this. I've met some great folks in the group who have become close friends and have also met some amazing business owners.

Bill, started New York City Gay Craft Beer Lovers

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy