Pre-Spark Summit East Meetup - Presentations + Q&A

Name: Pre-Spark Summit East Meetup - Presentations + Q&A
Start: 2017-02-07T18:00:00-05:00
End: 2017-02-07T21:00:00-05:00
Location: JOHN B. HYNES CONVENTION CENTER

Hosted by Nicholas C.

Boston Data Technology (Boston Data Group/BDT)

Details

Thank you to SnappyData (http://www.snappydata.io/) for sponsoring this event!

Agenda:

6:00 - 6:30: Refreshments
6:30 - 6:45: Opening remarks
6:45 - 7:15: Apache Spark as an operational database (Jags Ramnarayan, SnappyData)
7:15 - 7:45: Hacky tricks to being an Apache Spark rock star (Ted Malaska, Blizzard)
7:45 - 8:30: Ask Me Anything (Databricks)

---
6:45 - 7:15

Topic: Apache Spark as an operational database (Jags Ramnarayan, SnappyData)

Abstract: Apache Spark 2.0 offers many enhancements that make continuous analytics quite simple. In this talk, we will discuss many other things that you can do with your Apache Spark cluster. We explain how a deep integration of Apache Spark 2.0 and in-memory databases can bring you the best of both worlds! In particular, we discuss how to manage mutable data in Apache Spark, run consistent transactions at the same speed as state-the-art in-memory grids, build and use indexes for point lookups, and run 100x more analytics queries at in-memory speeds. No need to bridge multiple products or manage, tune multiple clusters. We explain how one can take regulation Apache Spark SQL OLAP workloads and speed them up by up to 20x using optimizations in SnappyData.

We then walk through several use-case examples, including IoT scenarios, where one has to ingest streams from many sources, cleanse it, manage the deluge by pre-aggregating and tracking metrics per minute, store all recent data in a in-memory store along with history in a data lake and permit interactive analytic queries at this constantly growing data. Rather than stitching together multiple clusters as proposed in Lambda, we walk through a design where everything is achieved in a single, horizontally scalable Apache Spark 2.0 cluster. A design that is simpler, a lot more efficient, and let’s you do everything from Machine Learning and Data Science to Transactions and Visual Analytics all in one single cluster.

Speaker Bio: Jags is CTO for Snappydata – a spark based startup. Previously, Jags was the Chief Architect for “fast data” products at Pivotal and served in the extended leadership team of the company. At Pivotal and previously at VMWare, he led the technology direction for GemFire. He helped lead the company strategy for data services, and worked closely with customers to help them be successful. Jags is recognized for his expertise in distributed systems and databases and is a frequent speaker on “distributed data”. He has a Bachelors degree in computer science and a masters degree in management.

---
7:15 - 7:45

Topic: Hacky tricks to being an Apache Spark rock star (Ted Malaska, Blizzard)

Abstract: To know the APIs of Apache Spark is one thing, but to be a master of different patterns to solve odd problems like skew, cartesian, time series, partition awareness, and more, is a another thing. This talk will go over odd tricks that you may not encounter in your normal days worth of Spark coding, but may be game changing if applied correctly.

Speaker bio: Ted is working on the Battle.net team at Blizzard, helping support great titles like World of Warcraft, Overwatch, HearthStone, and much more. Previously, he was a Principal Solutions Architect at Cloudera, helping clients be successful with Hadoop and the Hadoop ecosystem. Previously, he was a Lead Architect at the Financial Industry Regulatory Authority (FINRA). He has also contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is also a co-author or O’Reilly “Hadoop Application Architectures” and a frequent speaker at many conferences, and a frequent blogger on data architectures.

---
7:45 - 8:30

Topic: Ask Me Anything (AMA) (Databricks)

Abstract: Join Apache Spark core committers and contributors Michael Armbrust (https://spark-summit.org/east-2017/events/production-ready-structured-streaming/), Tim Hunter (https://spark-summit.org/east-2017/events/tuning-and-monitoring-deep-learning-on-apache-spark/), Eric Liang (https://spark-summit.org/east-2017/events/robust-and-scalable-etl-over-cloud-storage-with-spark/), Tathagata Das (https://spark-summit.org/east-2017/events/making-structured-streaming-ready-for-production-updates-and-future-directions/), Sameer Agarwal (https://spark-summit.org/east-2017/events/exceptions-are-the-norm-dealing-with-bad-actors-in-etl/), and Hossein Falaki (https://spark-summit.org/east-2017/events/parallelizing-existing-r-packages-with-sparkr/). Answering ALL your Spark related questions!

Boston Data Technology (Boston Data Group/BDT)

Pre-Spark Summit East Meetup - Presentations + Q&A

Boston Data Technology (Boston Data Group/BDT)

Details

Related topics

You may also like