Past Meetup

April: Lightning-Fast Cluster Computing with Spark and Shark

This Meetup is past

40 people went

Bronto Software, Inc.

324 Blackwell Street, Suite 410 · Durham, NC

How to find us

Bronto is accessible from the Carr Street side of the American Tobacco Campus, Bay #5 above Tyler's Taproom.

Location image of event venue

Details

Speakers: Mayuresh Kunjir (http://www.cs.duke.edu/people/graduate/?csid=0004030) and Harold Lim (http://www.cs.duke.edu/people/graduate/?csid=0002030), Duke University

Spark is an open-source cluster-computing system developed by the AMPLab at the University of California, Berkeley. Spark provides very fast performance and ease of development for a variety of data analytics needs such as machine learning, graph processing, and SQL-like queries. Spark supports distributed in-memory computations that can be up to 100x faster than Hadoop.

Shark is a Hive-compatible data warehousing system built on Spark. Shark supports the HiveQL query language, the Hive Metastore, and all the serialization formats supported by Hive. The use of Spark and a number of built-in optimizations make Shark perform up to 100x faster than Hive.

This talk will discuss the internals of Spark and Shark, the applications that these systems support, and show a demo that includes performance comparisons with Hive.