Johannesburg Data Platform meeting 12 May 2026
Details
Agenda
18:30 - Welcome and introduction
18:45 - Using execution plans to write efficient Spark Code by Michael Johnson
Using execution plans to write efficient Spark code.
When you first begin tuning Spark performance, it can sometimes feel like a mix of trial and error, or simply scaling up resources until the job completes in the expected timeframe. This often leads to frustration and wasted resources.
This session begins with a brief review of the Spark job execution lifecycle, followed by an exploration of how to use execution plans in Apache Spark to systematically optimise performance. Topics include:
* Inefficient join strategies
* Identifying unnecessary shuffles
* Partitioning and data skew issues
* The impact of Catalyst and Adaptive Query Execution
You will leave this session with a practical approach to identifying and fixing performance issues in your Spark workloads.
20:00 - Closing



