April: Lightning-Fast Cluster Computing with Spark and Shark

Name: April: Lightning-Fast Cluster Computing with Spark and Shark
Start: 2013-04-16T18:30:00-04:00
End: 2013-04-16T21:30:00-04:00
Location: Bronto Software, Inc.

Hosted by Jeff T.

Triangle Hadoop Users Group

Details

Speakers: Mayuresh Kunjir (http://www.cs.duke.edu/people/graduate/?csid=0004030) and Harold Lim (http://www.cs.duke.edu/people/graduate/?csid=0002030), Duke University

Spark is an open-source cluster-computing system developed by the AMPLab at the University of California, Berkeley. Spark provides very fast performance and ease of development for a variety of data analytics needs such as machine learning, graph processing, and SQL-like queries. Spark supports distributed in-memory computations that can be up to 100x faster than Hadoop.

Shark is a Hive-compatible data warehousing system built on Spark. Shark supports the HiveQL query language, the Hive Metastore, and all the serialization formats supported by Hive. The use of Spark and a number of built-in optimizations make Shark perform up to 100x faster than Hive.

This talk will discuss the internals of Spark and Shark, the applications that these systems support, and show a demo that includes performance comparisons with Hive.

Triangle Hadoop Users Group

April: Lightning-Fast Cluster Computing with Spark and Shark

Triangle Hadoop Users Group

Details

Related topics

You may also like