Skip to content

Details

Join us for a special meetup while our Spark engineering team is in Boston for Spark Summit East. Meetup is free and located outside of the Spark Summit conference.

Agenda:

• 630pm Food and drinks

• 7pm Tech talks with Q&A

• 8pm Networking

1st Tech Talk: FAULT TOLERANCE IN SPARK: LESSONS LEARNED FROM PRODUCTION

Spark is by its nature very fault tolerant. However, faults, and application failures, can and do happen, in production at scale. In this talk, we’ll discuss the nuts and bolts of fault tolerance in Spark.

We will begin with a brief overview of the sorts of fault tolerance offered, and lead into a deep dive of the internals of fault tolerance. This will include a discussion of Spark on YARN, scheduling, and resource allocation.

We will then spend some time on a case study and discussing some tools used to find and verify fault tolerance issues. Our case study comes from a customer who experienced an application outage that was root caused to a scheduler bug. We discuss the analysis we did to reach this conclusion and the work that we did to reproduce it locally. We highlight some of the techniques used to simulate faults and find bugs.

At the end, we’ll discuss some future directions for fault tolerance improvements in Spark, such as scheduler and checkpointing changes.

Speaker: Jose Soltren is a Software Engineer at Cloudera focused on Spark development. His core focus is on the internals of Spark as they matter to customers: reliability, correctness, and performance. He holds Master’s and Bachelor’s degrees from MIT.

2nd Tech Talk: Happy Birthday! Hadoop is now 10 years old. What started off as a duo of Open Source components (HDFS and Map Reduce) has now grown to a complex, ever-changing ecosystem of 50+ tools, components and technologies.

Speaker: Jon Gray jumped on the Hadoop bandwagon at the beginning and collected a wealth of experience and insights as to how to do Big Data at Web Scale. Jon and the team at Cask Data have figured out a thing or two as to how to simplify and accelerate Big Data projects. Jon will discuss the evolution of the Hadoop ecosystem and give us a look into what lies ahead for Hadoop, Spark and Big Data.

https://a248.e.akamai.net/secure.meetupstatic.com/photos/event/c/1/1/1/600_457609425.jpeg

Related topics

You may also like