Skip to content

Things you didn't know you could do in Spark

Photo of Satyendra Rana
Hosted By
Satyendra R.
Things you didn't know you could do in Spark

Details

Speakers

Jags, CTO, SnappyData,

and

Dr. Barzan Mojafari, Assistant Professor of Computer Science & Engineering,

University of Michigan, Ann Arbor

Abstract

Spark 2.0 offers many enhancements that make continuous analytics quite simple. In this talk, we will discuss many other things that you can do with your Spark cluster. We explain how a deep integration of Spark 2.0 and in-memory databases can bring you the best of both worlds! In particular, we discuss how to manage mutable data in Spark, run consistent transactions at the same speed as state-the-art in-memory grids, build and use indexes for point lookups, and run 100x more analytics queries at in-memory speeds. No need to bridge multiple products or manage, tune multiple clusters.

We then walk through several use-case examples, including IoT scenarios, where one has to ingest streams from many sources, cleanse it, manage the deluge by pre-aggregating and tracking metrics per minute, store all recent data in a in-memory store along with history in a data lake and permit interactive analytic queries at this constantly growing data. Rather than stitching together multiple clusters as proposed in Lambda, we walk through a design where everything is achieved in a single, horizontally scalable Spark 2.0 cluster. A design that is simpler, a lot more efficient, and let’s you do everything from Machine Learning and Data Science to Transactions and Visual Analytics all in one single cluster.

Speaker Bios

Jags is CTO for Snappydata. Previously, Jags was the Chief Architect for “fast data” products at Pivotal and served in the extended leadership team of the company. At Pivotal and previously at VMWare, he led the technology direction for GemFire. He helped lead the company strategy for data services, and worked closely with customers to help them be successful. Jags is recognized for his expertise in distributed systems and databases and is a frequent speaker on “distributed data”. He has a Bachelors degree in computer science and a masters degree in management.

Barzan Mozafari is an Assistant Professor of Computer Science and Engineering at the University of Michigan, Ann Arbor, where he leads a research group designing the next generation of scalable databases using advanced statistical models. Prior to that, he was a Postdoctoral Associate at MIT. He earned his Ph.D. in Computer Science from UCLA in 2011. His research career has led to many successful open-source projects, including CliffGuard (the first robust framework for database tuning), DBSeer (the first automated database diagnosis tool), and BlinkDB (the first massively parallel approximate query engine).

Photo of Michigan Spark Users Group group
Michigan Spark Users Group
See more events
330 E. Liberty Street; Lower Level · Ann Arbor, MI