Spark & Dataframes for hundreds of multi-tenant customers & billions of events

Hosted By
Asaf B.

Details
Target audience:
Big Data Engineers, Data Scientists, Architects
Meetup description:
Totango (www.totango.com ) is using Spark and Dataframes for processing and analyzing millions of events on behalf of hundreds of customers. In those talks we'll present supported data-pipeline architecture and deep dive into: * How to run tests in production while rolling out new code reliably and at high velocity,
- Technical challenges and lessons learned of how to scale spark while running efficiently across a huge variety of customers in a multi-tenancy environment:
- Ad hoc aggregations vs. running hundreds of aggregation processes on a schedule
- Simple vs. robust calculations across all edge cases
- Clean sheet system vs. migrating from old code and dealing with bugs
What you will learn (bullet list):
- Testing in production implementation patterns - How to reliably roll out code Spark code into a production environment
- Best practices for running spark computations across various input sizes and customer profiles
Presenter bio
Oren Raboy (VP Eng. and Co-founder, Totango) - https://il.linkedin.com/in/orenraboy
Romi Kuntsman (Senior Big Data Engineer, Totango) https://il.linkedin.com/in/romik Previous presentations:
https://www.youtube.com/watch?v=A9z0e_ppq-A
http://www.slideshare.net/RomiKuntsman/totango-workflowmgt-romi20150429

Artificial Intelligence Professionals - Israel
See more events
Campus TLV
98 Yigal Alon st., Floor 34rd (Electra Tower) · Tel-Aviv
Spark & Dataframes for hundreds of multi-tenant customers & billions of events