Past Meetup

Expedia Lessons Learned of Autoscaling Spark to 100's of EC2 Nodes Each Week

This Meetup is past

55 people went

Location image of event venue


This event has limited space, so you must RSVP through a separate link to ensure attendance:

Join us for insightful conversations and delicious beers with Lead Engineers from Expedia’s Email Marketing team on how they’ve built Apache Spark into their marketing campaign workflow. We will cover lessons learned of scaling Spark on an AWS data lake to service millions of customers each day, bursting from 20 to 100’s of EC2 nodes throughout the week. We'll explore the state of Apache Spark at scale today and optimizations Expedia is leveraging to incorporate it as a core part of their marketing strategy. Focusing on the challenges using Spark, and limitation looking into the future of Spark in 2018.

About the Team - Alpha Team is part of the OmniChannel Communications Team. The team is responsible for supporting brands: Expedia, Wotif, and LastMinute (and expanding to others). Our responsibility includes ensuring customers receive personalized content based on their needs, through various channels such as email and push app notifications.

6-6:25 pm - Doors Open (Drinks + Snacks)
6:25-6:30 - Announcements + Opening Remarks
6:30-7:00 pm - Session 1
7:00-8:00 pm - Session 2
8:00-8:30 pm - Q&A + Mingling (Drinks + Snacks)

Session 1
Speaker - Jagannath Narasimhan, Technical Product Manager

Title - Business Use Case of Ocelot/Alpha Product in Expedia

Jagannath will share how their team has increase revenue at Expedia by leveraging our Qubole Spark pipelines for sending out Marketing Emails, and also make use of the Spark clusters to build profiles for Expedia users. We will overview the dataflow architecture to show how we deploy our various big data pipelines (LTS, PreProd, and Prod). Servicing emails worldwide, as well as push notifications for the Expedia mobile app. Following, he will dive into our decision making process in how we size and measure the costs of our workloads using different types of clusters/instances (e.g. Spot vs On-Demand, R4 vs M4, and different # of nodes).

Session 2
Speakers - Nishant Jain and Nick Mergia, Software Development Engineers on the OmniChannel Communications Team

Title - Using Spark to Send Personalized Marketing Emails to the Expedia Customer Base

Nishant and Nick will deep dive into the technical challenges of pySpark vs. Scala, focusing on how they have migrated their Python jobs to Scala for better performance, reliability, and support of more use cases. Scala has shown many benefits such as better IDE support, unit testing, debugging capability, using external libraries, and the ability to package into a single JAR. Following this we will share how we orchestrate multiple production clusters using Jenkins, our debugging process using Scala JARs, and ultimately how we terminate these jobs with the use of AWS Lambda. We will close with learnings and recommendations of using Qubole for our Ocelot/Alpha Product and lessons learned of how we manage Spark (Shuffles, Broadcast, Repartitions).