Spark advanced topics

Details

* Please note the event is open for everyone (regardless of gender)

AGENDA
========
18:00-18:30 – Mingling

18:30 - 19:10 - Optimizing Spark-based data pipelines - are you up for it? - Etti Gur and Itai Yaffe @ Nielsen
19:30 - 20:00 - The benefits of running Spark on your own Docker - Shir Bromberg @ Yotpo

* Both session will be delivered in Hebrew

Title: Optimizing Spark-based data pipelines - are you up for it?

Abstract:
At Nielsen Marketing Cloud, we provide our customers (marketers and publishers) real-time analytics tools to measure their ongoing campaigns' efficiency.

To achieve that, we need to ingest billions of events per day into our big data stores and we need to do it in a scalable yet cost-efficient manner.
In this talk, we will discuss how we significantly optimized our Spark-based in-flight analytics daily pipeline, reducing its total execution time from over 20 hours down to 2 hours, resulting in a huge cost reduction.

Topics include:
* Ways to identify optimization opportunities
* Optimizing Spark resource allocation
* Parallelizing Spark output phase with dynamic partition inserts
* Running multiple Spark "jobs" in parallel within a single Spark application

Bio:
Etti Gur is a highly-experienced developer with over 20 years in the software industry. In the last 7 years, she has been working as a senior big data developer at Nielsen Marketing Cloud, building big data pipelines using Spark, Kafka, Druid, Airflow and more.

Itai Yaffe is a big data tech lead at Nielsen Marketing Cloud, where he deals with big data challenges using tools like Spark, Druid, Kafka, and others. He is also a part of the Israeli chapter's core team of Women in Big Data. Itai is keen about sharing his knowledge and has presented his real-life experience in various forums in the past

Title: The benefits of running Spark on your own Docker

Abstract:
Nowadays, many of an organization’s main applications rely on Spark pipelines. As these applications become more significant to businesses, so does the need to quickly deploy, test and monitor them.

The standard way of running spark jobs is to deploy it on a dedicated managed cluster. However, this solution is relatively expensive with potentially high setup time. Therefore, we developed a way to run Spark on any container orchestration platform. This allows us to run Spark in a simple, custom and testable way.

In this talk, we will present our open-source dockers for running Spark on Nomad servers. We will cover:
* The issues we had running spark on managed clusters and the solution we developed.
* How to build a spark docker.
* And finally, what you may achieve by using Spark on Nomad.

Bio:
Shir Bromberg is a Big Data team leader at Yotpo, with an experience of 3 years in the data world, and over 9 years of software development.
Shir has lectured on multiple events, including "Women in big data", "She codes" and Yotpo engineering in-house meetup.
She strives to bring the finest solutions for technological challenges arising in a cutting edge field of big data.

PARKING
=========
There is a free 3 hours parking in TLV Fashion mall (5 minutes walk from the venue) and free parking at Givon parking for Discount bank card holders.