Skip to content

Hadoop - Looking to the Future, YARN: Past, Present and Future

Photo of Tamas Nemeth
Hosted By
Tamas N. and 2 others
Hadoop - Looking to the Future, YARN: Past, Present and Future

Details

This will be an English speaking event, co-organized with the Big Data Meetup Budapest (https://www.meetup.com/Big-Data-Meetup-Budapest).

Hadoop - Looking to the Future (Arun C Murty/Hortonworks)

The Apache Hadoop ecosystem began as just HDFS & MapReduce nearly 10 years ago in 2006.

Very much like the Ship of Theseus ( http://en.wikipedia.org/wiki/Ship_of_Theseus ), Hadoop has undergone incredible amount of transformation from multi-purpose YARN to interactive SQL with Hive/Tez to machine learning with Spark.

Much more lies ahead: whether you want sub-second SQL with Hive or use SSDs/Memory effectively in HDFS or manage Metadata-driven security policies in Ranger, the Hadoop ecosystem in the Apache Software Foundation continues to evolve to meet new challenges and use-cases.

Arun C Murthy has been involved with Apache Hadoop since the beginning of the project - nearly 10 years now. In the beginning he led MapReduce, went on to create YARN and then drove Tez & the Stinger effort to get to interactive & sub-second Hive. Recently he has been very involved in the Metadata and Governance efforts. In between he founded Hortonworks, the first public Hadoop distribution company.

YARN: Past, Present and Future (Vinod Kumar Vavilapalli/Hortonworks)

Apache Hadoop YARN is a distributed, multi-tenant and fault tolerant resource-management platform.

In this talk, we’ll first cover how YARN stands out today as a enterprise data processing platform and how YARN has been deployed and utilized in real production clusters.

Then, we’ll move on to recent efforts and a few forward-looking features that further YARN as a first class data-operating-system - rolling upgrades, support for long-lived services like HBase & Storm, workload scheduling like node labels, preemption, timeline service for application monitoring/metrics, resource scheduling & isolation on cpu, disks and network.

Vinod Kumar Vavilapalli is the Hadoop YARN and MapReduce guy at Hortonworks. He is a long term Hadoop contributor at Apache, Hadoop committer and a member of the Apache Hadoop PMC. He has a Bachelors degree from Indian Institute of Technology Roorkee in Computer Science in Engineering. He has been working on Hadoop for more than 6 years and he still has fun doing it. Straight out of college, he joined the Hadoop team at Yahoo! Bangalore where he worked on HadoopOnDemand, Hadoop-0.20, CapacityScheduler, and Hadoop security, before Hortonworks happened. He is passionate about using computers to change the world for better, bit by bit. He is reachable at twitter handle @tshooter.

Photo of Budapest Data Science Meetup group
Budapest Data Science Meetup
See more events
Prezi House of Ideas
Nagymezo 54-56, · Budapest