Skip to content

Holden Karau on Debugging Apache Spark - Making Sense of Stack Traces & More

Photo of Lynn Bender
Hosted By
Lynn B. and Steve G.
Holden Karau on Debugging Apache Spark - Making Sense of Stack Traces & More

Details

We will be raffling off two tickets to Data Day Texas 2018 at the meetup. You must RSVP and be present to enter.

Our dear friend Holden Karau (https://www.google.com/search?q=%22Holden%20Karau&rct=j) will be returning to Austin to share the latest Spark goodness. She'll be previewing her Strata SJ talk for us in Austin. You'll see it first!

Abstract

Debugging Apache Spark - Making Sense of Stack Traces & More

Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau explores how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, and some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose. Holden and Rachel demonstrate how to effectively search logs from Apache Spark to spot common problems and discuss options for logging from within your program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but Holden and Joey look at how to effectively use Spark’s current accumulators for debugging before gazing into the future to see the data property type accumulators that may be coming to Spark in future versions. And in addition to reading logs and instrumenting your program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems. Holden covers how to quickly use the UI to figure out if certain types of issues are occurring in our job.

About the Speaker

Holden Karau (https://www.linkedin.com/in/holdenkarau) @holdenkarau (http://twitter.com/holdenkarau) is a software development engineer and is active in open source. She a co-author of Learning Spark & Fast Data Processing with Spark and has taught intro Spark workshops. Prior to IBM she worked on a variety of big data, search, and classification problems at Alpine, DataBricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelors of Mathematics in Computer Science. Outside of computers she enjoys dancing & playing with fire.

Agenda

6:30: Meet and Greet / Networking
7:00: Announcements and Featured Talk
8:30: Adjourn to pub

Photo of Austin Spark Meetup group
Austin Spark Meetup
See more events
IBM Austin Building 908 Room 03-3E027
11501 Burnet Rd, Austin, TX, USA · Austin, TX