Skip to content

Top 5 Mistakes When Writing Spark Applications

Photo of Imran Rashid
Hosted By
Imran R.
Top 5 Mistakes When Writing Spark Applications

Details

In the world of distributed computing, Spark has simplified development and open the doors for many to start writing distributed programs. Folks with little to none distributed coding experience can now start writing just a couple lines of code that will get 100s or 1000s of machines, immediately, working on creating business value. However, even through Spark code is easy to write and read, that doesn’t mean that users don’t run into issues of long running, slow performing jobs or out of memory errors. Thankfully most of the issues with using Spark have nothing to do with Spark but the approach we take when using it. This session will go over the top 5 things that we’ve seen in the field that prevent people from getting the most out of their Spark clusters. When some of these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters, the same data, just a different approach.

(This will largely be a repeat of a popular presentation coworkers of mine gave at a previous Spark Summit, with some updating and my own additions)

Photo of Chicago Spark Users group
Chicago Spark Users
See more events
IBM office, Hyatt Center, 6th floor
71 S.Wacker · Chicago, IL