TechTalk: Salting your Spark to Scale, Morri Feldman


Details
One of our finest engineers, Morri Feldman, will be visiting in Israel and he'll be sharing one of his latest battle stories with Spark. You won't want to miss it!
Agenda:
17:00 - Gathering // Noshables
17:15 - Talk: Salting your Spark to Scale
18:00 - Pizzas & Beer
Abstract:
One of the major issues that Spark batch jobs have to contend with at AppsFlyer is that our data is inherently skewed. For instance a couple of apps account for the vast majority of our traffic. Data skew wreaks havoc on naively written data jobs by making them perform and scale very poorly as the amount of data they need to process increases. Recently one of our central data aggregations -- the process that prepares data for the overview dashboard -- stopped working and we had essentially reached the limit where we could no longer devote more Ram to the process to help it. Using a technique called "Salting" to overcome the data skew that was killing this job we were able to get the job working again and make the entire process much more scalable. I'll go over Salting in depth to explain how it works and how we are starting to use it here at AppsFlyer.
Audience Level:
Advanced - Technical
Bio:
Morri Feldman joined AppsFlyer as the first member of our data team about 4 years ago. Before AppsFlyer, he was in academia training as a Biophysicist. Since joining AppsFlyer he has been loving the fast pace of development and exciting technologies that used to handle the enormous scale of AppsFlyer data. Most of his coding at AppsFlyer is in Clojure with some work in Scala for Spark jobs. The functional paradigms in Clojure really help to quickly write correct code that performs well in a multi-threaded environment

TechTalk: Salting your Spark to Scale, Morri Feldman