Skip to content

April: Lightning-Fast Cluster Computing with Spark and Shark

Photo of Jeff Turner
Hosted By
Jeff T.
April: Lightning-Fast Cluster Computing with Spark and Shark

Details

Speakers: Mayuresh Kunjir (http://www.cs.duke.edu/people/graduate/?csid=0004030) and Harold Lim (http://www.cs.duke.edu/people/graduate/?csid=0002030), Duke University

Spark is an open-source cluster-computing system developed by the AMPLab at the University of California, Berkeley. Spark provides very fast performance and ease of development for a variety of data analytics needs such as machine learning, graph processing, and SQL-like queries. Spark supports distributed in-memory computations that can be up to 100x faster than Hadoop.

Shark is a Hive-compatible data warehousing system built on Spark. Shark supports the HiveQL query language, the Hive Metastore, and all the serialization formats supported by Hive. The use of Spark and a number of built-in optimizations make Shark perform up to 100x faster than Hive.

This talk will discuss the internals of Spark and Shark, the applications that these systems support, and show a demo that includes performance comparisons with Hive.

Photo of Triangle Hadoop Users Group group
Triangle Hadoop Users Group
See more events
Bronto Software, Inc.
324 Blackwell Street, Suite 410 · Durham, NC