Big Data: Apache Spark at Scale


Details
This is a live event in our offices and it will also be broadcasted live
Agenda:
18:00-18:30 Mingling
18:30-19:00 Horizontal Scaling for Python Sentiment Algorithm / Lee Ofri, Architecture and Research Engineer @ Amobee
Watch here: https://meet.google.com/mpd-jkgw-tvo
19:00-19:15 Break - Beer, Pizza and Mingling
19:15-19:45 Spark Structure Streaming State Store & GC / Gad Avivi, Senior Big Data Developer @ Amobee
Watch here: https://meet.google.com/hdr-vitz-idp
19:45-2015 Mingling
Lee Ofri:
Overview real world case study, how we managed to analyze sentiment in a dataset of over 1 million articles in an hour. The session includes solution overview, deep dive into PySpark performance and a coding demo.
Gad Avivi:
In this lecture I will go trough a real usage of production feature of Spark Streaming with state usage. I will explain the main memory issue we had, how we analyzed the problem, go briefly over Java GC, how spark state works under the hood and of course how we solved the issue.
The talks are in Hebrew.
The event will also be streamed online, see links above
COVID-19 safety measures

Big Data: Apache Spark at Scale