Skip to content

Apache Spark Meetup at Bloomberg

Photo of Scott Walent
Hosted By
Scott W.
Apache Spark Meetup at Bloomberg

Details

Join us for a Bay Area Spark Meetup featuring tech talks from Databrick's Tathagata Das and Bloomberg's Sudarshan Kadambi. Pizza and drinks will be served. We will also film the talks to post later.

PLEASE NOTE: to register you must use this link to ensure a seat: https://www.eventbrite.com/e/apache-spark-meetup-at-bloomberg-tech-tickets-27395917928

You will also need a photo id to enter the building.

AGENDA:
6:00 - 6:30pm: Reception
6:30 - 6:35pm: Welcoming Remarks
6:35 - 7:05pm Tech Talk - 1: Tathagata Das, Databricks
7:05 - 7:35pm Tech Talk - 2: Spark and Online Analytics, Sudarshan Kadambi, Bloomberg L.P.
7:35 - 8:30pm Reception

TECH-TALK 1: Tathagata Das
In Apache Spark 2.0, we have extended DataFrames and Datasets in Spark to handle streaming data. Streaming Datasets not only provides a single programming abstraction for batch and streaming data, it brings support for event-time based processing, out-or-order/delayed data, sessionization and tight integration with non-streaming data sources and sinks. In this talk, Tathagata will take a deep dive into the concepts and the API and show how this simplifies building complex “continuous applications”.

TECH-TALK 2: Spark and Online Analytics, Sudarshan Kadambi, Bloomberg
Apache Spark was designed as a batch analytics system. By caching RDDs, Spark speeds up jobs that iteratively process the same data. This pattern is also applicable to online analytics. We use Bloomberg's Spark Server as a server runtime for online analytics. Our framework implements certain useful patterns applicable to online query processing and is centered on the idea of “Managed” DataFrames that can be refreshed and updated as per user requirements, without violating the immutability of RDDs/DataFrames. However, Spark presents significant challenges with respect to availability and resilience in an online setting where Spark is required to respond to queries with high SLAs. In this talk, we try to identify specific areas where slow-down or failures can result in the largest hits on online-query performance and potential solutions to address these.

Photo of Bay Area Spark Meetup group
Bay Area Spark Meetup
See more events
Bloomberg
140 New Montgomery 22 Floor · San Francisco, CA