Inside Spark Core: Understanding Spark to write better performing code

Zürich Apache Spark Meetup
Zürich Apache Spark Meetup
Public group
Location image of event venue


Event Agenda

18:30 - 19:00

Arrival and socialising

19:00 - 20:00

Philipp Brunenberg, Apache Spark Expert & Datascience Consultant

"Inside Spark Core: Understanding Spark to write better performing code"

When beginning to use Spark we have the choice between two roads to go down: We can either start directly to write Spark code and to implement the use cases we came for. Usually, we achieve this with trial, error and StackOverflow. Doing so, we rely on Spark to magically execute our workload in the hopefully most efficient way. Most developers would stop here. Or, we start our journey with a different approach by firstly gathering an understanding of Spark's concepts and what is happening internally. From my experience, most people, and also most companies, tend to take the former approach and start with implementing their use cases right away. This certainly valid approach, however, often leads us straight to performance issues as we try to scale our projects.

Throughout this talk, we will go on a walk through the most important Spark (Core) internal components to gain a deeper understanding of how parallelization is achieved. Based on these insights, we will extract some very useful paradigms that we should have in mind when sitting down to write better performant Spark code. No matter if you are an experienced Spark user or want to leverage all of its beauty right from the start - this talk gives practical advice on how to write better Spark code.

Big thanks to ImpactHub for sponsoring the location: The global community of entrepreneurial people prototyping the future of business. At Impact Hub, you can connect, collaborate, co-work and create great content in an inspiring environment.