Skip to content

Sandy Ryza: Why is my Spark Job failing?

U
Hosted By
user 4.
Sandy Ryza: Why is my Spark Job failing?

Details

Sandy Ryza (https://www.linkedin.com/pub/sandy-ryza/42/4b7/763) (@sandysifting (http://twitter.com/sandysifting)), Cloudera Data Scientist is in town for StrataNY (http://strataconf.com/stratany2014), and we've asked him to help kick off this series by the evening with us. Sandy will discuss how Spark works through the lens of all the parts of it that can go wrong. Sandy will cover the tools, logs, tips, and internals that can help you understand how not to get in this unfortunate situation, and how to make it back out.

Why is my Spark Job Failing?

You are not a bad person. But your Spark job is failing. It is running out of memory. It is stalled. It is complaining that no executors have registered or spitting out "Filesystem closed" exceptions with lines upon lines of $anon$1's or being consumed by a swarm of locusts the likes of which have not been seen since Moses crossed the Red Sea. Or it's completing -- 20 times as slow as it should reasonably take.

About the Speaker

Sandy Ryza is a data scientist at Cloudera, a Hadoop committer, and a Spark contributor.

About Spark

Apache Spark (https://spark.apache.org/) is an open-source data analytics cluster computing framework originally developed in the AMPLab (https://amplab.cs.berkeley.edu/) at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster's memory and query it repeatedly, making it well suited to machine learning algorithms.

A special thanks to the fine folks at Shutterstock (http://www.shutterstock.com/) who reached out to offer their bright shiny new headquarters (http://www.shutterstock.com/blog/look-inside-shutterstocks-new-hq-in-the-empire-state-building) in the Empire State Building for this event, and to Cloudera (http://www.cloudera.com/) for providing refreshments. Kudos.

Photo of Evening with a Data Scientist group
Evening with a Data Scientist
See more events
Shutterstock
350 Fifth Avenue · New York, NY