Skip to content

Spark War Stories - What's Really Painful

Photo of shlomi hassan
Hosted By
shlomi h. and Demi B.
Spark War Stories - What's Really Painful

Details

http://photos2.meetupstatic.com/photos/event/5/9/0/f/600_445222799.jpeg

18:00 -18:30 - Mingling

18:30 -19:00 - S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben-Ari @ Windward

19:00 - 19:30 - Luigi doesn't have the Spark? Mario to the rescue - Igor Berman @ DynamicYield

19:30 - 19:40 - Break

19:40 - 20:10 - InnerActive's work with Spark Streaming for Real Time Bidding - Gal Aviv @ InnerActive

20:10 - 20:40 - Spark Jobs and Data Modeling in Cassandra - Tal Sliwowicz - @Taboola

Title: S3, Cassandra or Outer Space? Dumping Time Series Data using Spark

Abstract:
Vast volume of our processed data is Time Series data and once you start working with distributed systems, you start tackling many scale and performance problems, many questions arise:

How to handle missing data?

Should my system handle both serving and backed process or separating them out?

Which one of the solutions will be cheaper? Best Performance for Money?

In the talk we will tell the tale of all of the transformations we’ve made to our data model @Windward, show some of the problems we’ve handled, review the multiple data persistency layers like: S3, MongoDB, Apache Cassandra, MySQL.

And I’ll try my best NOT to answer the question “Which one of them is the Best?”

Sharing our Pain and Lessons learned is promised!

Bio:
Demi Ben-Ari (https://www.linkedin.com/in/demibenari), Sr. Data Engineer @Windward,

I have over 9 years of experience in building various systems both from the field of near real time applications and Big Data distributed systems.

Co-Founder of the “Big Things” Big Data community: http://somebigthings.com/big-things-intro/

I’m a software development groupie, Interested in tackling cutting edge technologies.

Talk 2
Title: Luigi doesn't have the Spark? Mario to the rescue

Abstract:
At Dynamic Yield we were looking a for a tool to describe, run and monitor a complex workflow of Spark jobs. We began by using the Python-based Luigi, only to find out the the cost to launch each job in its own Spark context is prohibitive, and that surprisingly there's currently no tool that lets you model and run a nice job workflow from within a Spark context, with dependencies n'all. Hence, we've set up to create our own tool: Mario (but of course), and we'd like to share what we learned and accomplished along the way.

We've also found out some inconvenient truths about Avro support, HadoopRDDs and more, and we'd be glad to share some of these findings from the battlefield.

Bio:
Igor Berman, Senior Software Developer - Data Team @ Dynamic Yield.

Talk 3
Title: InnerActive's work with Spark Streaming for Real Time Bidding

Abstract: (Will be updated a bit in the future)

InnerActive's work with Spark Streaming in a technical deepdive,

Want to do map reduce in real time for real time bidding.

  • 1000 record in a seconds - MySQL around 1.2 TB - 550 days aggregate.

  • raw event are written in s3.

  • want to show a real time board like stock market.

Bio:
Gal Aviv (https://il.linkedin.com/in/galaviv), R&D Group Manager @ InnerActive

Talk 4

Title:

Abstract:
Taboola has multiple data centers across the globe running a fully stateless recommendation engine. The engine receives multiple signals and events which it asynchronously sends to the backend for real time processing. In this talk we will describe how we managed to reduce by orders of magnitude the required processing power of our Spark jobs, by changing the modeling of the data stream and processing pipe.

Bio:
Tal Sliwowicz (https://www.linkedin.com/in/talsliwowicz) is a R&D Director @ Taboola
managing the Publisher R&D group and leading the Taboola engineering team that develops a new data path, processing more than 5TB of daily data in real time using Spark, Cassandra (and other awesome tech buzzwords). He is also one of the organizers of the Israel Spark Meetup group.

Food

Pizza, beer and light drinks are courtesy of our host Kenshoo

Parking

There are parking lots around the venue and it is possible to park in the nearby streets

Photo of Big Things group
Big Things
See more events
Kenshoo
HaBarzel 8 · Tel Aviv-Yafo