Efficient real-time State management with Spark & SnappyData

Name: Efficient real-time State management with Spark & SnappyData
Start: 2016-04-07T18:00:00-07:00
End: 2016-04-07T21:00:00-07:00
Location: Columbia Square, 8th floor

Hosted By

Thomas L. and 2 others

Efficient real-time State management with Spark & SnappyData

Details

Come join us for the Portland Spark User Group's very first meetup! For the first meetup, SnappyData (https://github.com/SnappyDataInc/snappydata) will be presenting as well as (we hope) someone subscribed to our meetup.

Agenda

6:00: Food/drinks arrive

6:20: Talk #1: Efficient real-time State management with Spark & SnappyData

7:20: Questions

7:30: Talk #2: (TBD) Please message us through meetup.com if you're interested in talking

8:20: Questions

8:30: chill + relax = chillax

Description

Talk #1:

Efficient real-time State management with Spark & SnappyData

Abstract:

Spark 2.0 continues to advance its support for real time processing. With “structured streaming”, a single unified API enables querying and combines streams and static data frames. Most streaming applications are stateful computations that need to build and maintain state incrementally.

Many applications will continue to use external stores (SQL, NoSQL or in-memory DB). However, for many scenarios (e.g., in IoT) this approach can be challenging, due to excessive serialization/deserialization, slow scan/aggregation performance in row-oriented database, and the difficulty in enforcing exactly-once semantics. Without sufficient care, an application may easily fail to keep up with the incoming stream.

In this talk, we will walk through a few common use case patterns ingesting streams via the new “structured streaming” APIs and study the different options for managing state - Spark 2.0’s new streaming state API, using external in-memory/NoSQL stores, or an in-memory database that runs collocated with Spark executors (i.e., sharing the same memory space). We will introduce SnappyData - a open source real time data platform that fuses spark with an in-memory data grid with some novel extensions to spark. We explain why data needs to be managed differently than the accepted norm. Perhaps, not managed in its entirety at all. We explore the benefits of dramatically compressing data using probabilistic data structures and executing analytics at the “speed of thought” using approximate query processing using limited resources.

Speaker: Jags Ramnarayan

http://photos3.meetupstatic.com/photos/event/d/6/7/b/600_447654907.jpeg

Jags is a founder and the CTO of SnappyData (https://github.com/SnappyDataInc/snappydata). Previously, Jags was the Chief Architect for “fast data” products at Pivotal and served in the extended leadership team of the company. At Pivotal and previously at VMWare, he led the technology direction for GemFire and other distributed in-memory products.

Talk #2: (TBD) Please message us through meetup.com if you're interested in talking

Abstract: (TBD)

Speaker: (TBD)

Events in Portland, OR