Data Engineering: Bulk Up Your Data Eng Skills using Airflow, Spark & DASK
// Please notice: This meetup will be hybrid -- it will be both online (talk will be streamed via Zoom) and Offline (Bitan 27, Namal TLV). ------
Welcome engineers and technologists in the TLV area. Whether you are a hands-on software engineer, data engineer, or just a person interested in advanced technologies, you will enjoy this meetup. Hosted at Wix's TLV offices.
We’ll cover topics around data science and data engineering, plus will work on improving your knowledge of a common tool such as Airflow, increasing your knowledge of BigData Engines like Spark, and will even get you familiarized with Dask. This meetup will focus on helping you improve your dev velocity - we will share must-know practices, pitfalls, optimizations, tuning, plus will also introduce the tools that are gaining momentum in the DS world.
- 17:00 - 17:30 - Gathering
- 17:30 - 18:00 - Apache Airflow - Improve DAG authoring skills: Tips, Tricks and More! by Elad Kalif
- 18:00- 18:30 - Apache Spark Optimization Techniques and Tuning by Almog Gelber
- 18:30 - 18:45 Break
- 18:45- 19:15 Not Only Spark! Introducing Dask - A Pythonic Big Data Framework for Data Science by Itamar Faran
- 19:15 -19:45 - Q&A
// Airflow - Improve DAG authoring skills: Tips, Tricks and More! / Speaker: Elad Kalif
A broken DAG surprised you? How about a non-templated Jinja format? Join me to learn about this and other crucial Airflow practices/usages. You will learn about the features and the must-know practices, plus the common pitfalls of working with Apache-Airflow.
Elad is a data engineer @Wix for 3 years and part of the infrastructure team which enables solutions to serve a wide range of developers. He is an open source advocate - Apache Airflow committer and PMC member.
// Apache Spark Optimization Techniques and Tuning / Speaker : Almog Gelber
This session will cover the common bottlenecks and pain points when building a spark pipeline, ways to fix them and make the application more efficient.
Almog is a big data engineer. @Wix for 2 years as part of the infrastructure team.
Almog is Apache spark tech lead and responsible for promoting tools and infrastructure that will make spark more accessible, aside optimizing spark jobs across the data organization.
// Not Only Spark! Introducing Dask - A Pythonic Big Data Framework for Data Science / Itamar Faran
While Spark is the state-of-the-art technology for huge out-of-memory data, its infrastructure-overhead may sometimes be “not worth it” for data science projects. Introducing Dask, a lightweight and pure-pythonic framework for out-of-memory dataframes built on numpy and pandas that integrates within the python data science ecosystem.
Itamar is a Data Scientist at Vesttoo. He is experienced with integration of big-data tools in data science projects.