YARN by default


Detalles
We start this new season with the talk “Spark on YARN” by Ferran Galí, who works at Trovit Search (https://www.trovit.com/)and co-organizes this meetup. It will be a great pleasure to hear from Ferran about his experience in moving from Hadoop to YARN-powered clusters and running Apache Spark on top of this novel infrastructure. This event is sponsored by Trovit Search and it will be held at their offices in Barcelona.
Abstract:
YARN by default
With the rise of the cloud, data intensive systems and the Internet of Things the use of distributed systems have become widespread.
The first big player was Hadoop, which provided an integral solution to Big Data storage and computation problems. Its popularity empowered many organizations to adopt this technology. However new challenges appeared, like the need to be able to execute iterative, interactive or in-memory algorithms without the disk-intensive burden of MapReduce. For that very reason Hadoop evolved, decoupling its resources manager from the main computation engine: YARN was born. As a result of its vast adoption, YARN has become the de-facto distributed operating system for Big Data.
Since early releases, Apache Spark provided a way to be executed on YARN-powered clusters. In this talk we will take a look into that technology, and we will learn what it means having Spark running on this kind of infrastructure.
Bio:
Ferran Galí i Reniu is passionate about web scale distributed systems. Working on Big Data technologies for several years he gained expertise solving problems that require a massive amount of data processing. Architecting the deployment of Hadoop on a cluster of machines, developing new solutions or playing data scientist to make the business thrilling are some of the day-to-day tasks he has to deal with. Right now he is working in Trovit building the best search engine for classified ads. https://www.linkedin.com/in/ferrangali (https://www.linkedin.com/in/ferrangali%C2%AD) – @ferrangali

YARN by default