SnappyData - create robust analytic applications with Spark Streaming

Name: SnappyData - create robust analytic applications with Spark Streaming
Start: 2016-04-20T18:30:00-04:00
End: 2016-04-20T21:30:00-04:00
Location: Pivotal Labs

Hosted by Anonymous_34237772 and Anonymous_184719675

Pivotal NY - Big Data & Analytics Meetup

Details

6:30-7:00PM: Food & Mingling (no admission prior to 6:30)

7:00-8:15PM: Snappy Data & Spark Streaming Talk & Demo

8:15-8:30PM: Wind down

Data scientists and developers alike have been inspired by the possibilities of using Spark Streaming for analytic stream processing applications. But the truth is that Spark Streaming lacks the robust data management capabilities that even moderately sophisticated applications require.

But with SnappyData as part of your solution, now you get the benefit of powerful data management capabilities such as data consistency and transactions, complex querying, and high availability - all at in-memory processing speed.

SnappyData is a new open source project that deeply integrates the Apache Geode in-memory data grid, an in-memory scale out SQL store, and Spark into a single unified cluster. This provides (i) low latency transactions, (ii) a new cluster manager capable of avoiding queuing and scheduling for low latency operations, (iii) and HA for all the components. This novel architecture transforms a spark cluster into a hybrid database with OLAP (compressed columnar tables) and OLTP tables with full HA and recovery.

In this Meetup we'll discuss response time challenges with analytic queries on streams. We'll outline "Synopsis" synopses data structures available in SnappyData, such as stratified sampling, that reside along side "exact" data for fastest processing. We'll also present our design pattern for an Analytics SQL cache that combines Apache Spark and the Greenplum DB as the backend to speed up interactive query processing while accommodating continuous streaming writes.

We will demo Snappy working with Spark and Spark streaming, how to use them with the Spark programming model, and SQL/Spark Dataframe extensions for supporting approximate query processing. We'll show how we achieve 10X performance gains of using SnappyData with Spark streaming compared to Cassandra and showcase similar gains when running OLAP queries on the ingested data compared to Spark SQL (using Stratified samples).

ABOUT THE SPEAKER:

Jags is a founder and the CTO of SnappyData. Previously, Jags was the Chief Architect for “fast data” products at Pivotal and served in the extended leadership team of the company. At Pivotal and previously at VMWare, he led the technology direction for GemFire and other distributed in-memory products.

Pivotal NY - Big Data & Analytics Meetup

SnappyData - create robust analytic applications with Spark Streaming

Pivotal NY - Big Data & Analytics Meetup

Details

Related topics

You may also like