Get the most out of Spark on YARN


Details
Session Description:
In this hands-on presentation (live-code and demos) we will discuss variety of existing and proposed extension points to Spark and demonstrate some of the completed as well as prototyped efforts around extending Spark with sole purpose of facilitating access to variety of native features of various platforms. We will also demonstrate how these features can greatly improve performance as well as stability, management and monitoring of your application.
Abstract:
Apache Spark - a newcomer to the world of Big Data has successfully filled the long standing void of "developer experience" with its fluent and intuitive API and DAG (Job) assembler while also introducing several innovative memory oriented abstractions (i.e., RDD and Stage result caches, etc.). Along with providing native resource-managing platform, Spark also integrates with external resource-managing platforms such as Apache YARN and Mesos. However, the integration model currently used by Spark does not adequately address integration with existing and new DAG-like and DAG-capable execution environments native to such platforms thus limiting access to some of the native features these platforms provide (e.g., MR2/Tez stateless shuffle, YARN resource localization, YARN management and monitoring and more). In this hands-on presentation (live-code and demos) we will discuss variety of existing and proposed extension points to Spark and demonstrate some of the completed as well as prototyped efforts around extending Spark with sole purpose of facilitating access to variety of native features of various platforms. We will also demonstrate how these features can greatly improve performance as well as stability, management and monitoring of your application.
Speaker #1: Oleg Zhurakousky
Company: Hortonworks
Job Title: Principal Architect
Email: [masked]
Biography: Oleg is a Principal Architect with Hortonworks responsible for architecting scalable BigData solutions using various OpenSource technologies available within and outside the Hadoop ecosystem. Before Hortonworks Oleg was part of the SpringSource/VMWare where he was a core engineer working on Spring Integration framework, leading Spring Integration Scala DSL. He has 18+ years of experience in software engineering across multiple disciplines including software architecture and design, consulting, business analysis and application development. As a speaker Oleg presented seminars at dozens of conferences worldwide (i.e. Hadoop Summit, SpringOne, JavaOne, JavaZone, Scala Days, Oredev etc)
Speaker #2: Tom McCuch
Company: Hortonworks
Job Title: Director, Solutions Engineering
Email: [masked]
Biography: Tom has passion for open source frameworks, distributed systems, and solution engineering. With more than 22 years experience in Software Engineering, he has served in many different roles across Enterprise Architecture, Product Engineering, Professional Services, and Sales Engineering. With a B.S. in Mathematics and a M.S. in Software Engineering, he feels that the very best of his two worlds comes together in storing, managing, integrating, processing and analyzing big data. As a speaker Tom has presented seminars at several conferences (i.e.Hadoop Summit, SpringOne and variety of UG).

Get the most out of Spark on YARN