Apache Spark: How it's being used in production


Details
17:00 - 17:30 - Mingling
17:30 - 18:15 - Demi Ben-Ari (Senior Software Engineer @ Windward) - “Spark in the Maritime Domain”
We will be showing the use case of the implementation of a Data Pipeline in the maritime domain @Windward via Spark applications.
The process was converting a Monolith application to a fully distributed and scalable application.
We'll be talking about all the tools and the process of taking an idea and developing Spark applications around it, And will show the development of an application End to End, from DevOps to the method of thinking about the development of applications, showing use-cases and the "lessons learned" at Windward Ltd, I hope that after the talk, it will give you some more Practical tools to "Spark"ing your way around.
18:15 - 18:25 - Coffee Break
18:30 - 19:15 - Tzach Zohar (Architect @ Kenshoo) - “Spark your legacy - real-world lessons from distributing an 8-year-old monolith”
We're all here because we understand the potential of Spark for heavy-weight distributed processing. But how does one migrate an 8-years-old, single-server, MySQL-based legacy system to such new shiny frameworks? How do you accurately preserve the behavior of a system consuming Gigabytes of data every day, hiding numerous undocumented implicit gotchas and changing constantly, while shifting to brand new development paradigms? In this talk I'll present Kenshoo's attempt at this challenge, where we migrated a legacy aggregation system to Spark. Our solutions include heavy usage of metrics and graphite for analyzing production data; "local-mode" client enabling reuse of legacy tests suits; data validations using side-by-side execution; and maximum reuse of code through refactoring and composition. Some of these solution use Spark-specific characteristics and features.
19:15 - 19:25 - Beer Break
19:30 - 20:15 - Ruthy Goldberg, Tal Sliwowicz (Taboola R&D) - “Spark Magic - building and deploying a high scale product in 4 months”
Taboola’s R&D was tasked to build a new real-time high scale platform for home page optimization (“Newsroom”). We had a mission to design, develop and deploy a new full blown real time data analysis production system in 4 months after an alpha. We were able to achieve our goals and more using Spark and Cassandra.
We now have many live production publishers using it exclusively (weather.com, theblaze, tribune, college humor and many others) and usage is growing.
In this talk we would like to share our real-world experience of building and deploying Spark and Cassandra applications.

Apache Spark: How it's being used in production