Message from Michal: As a member of the Hadoop community, Tez is a very important project to learn about and keep an eye on. In the world of MapReduce alternatives, Spark has grabbed many of the head lines and social media attention, but Tez is a solid technology with it's own distinct advantages. If you are designing a data pipeline in Hadoop, please join us as Bikas Saha from Hortonworks gives an overview of Tez. Below is a description of the presentation and also Bikas' biography.
Apache Tez is a framework to create purpose-built data processing applications on YARN for Hadoop 2. Tez aims to provide high performance and efficiency out of the box, across the spectrum of low latency queries and heavy-weight batch processing. It provides a sophisticated topology API, advanced scheduling and concurrency control & proven fault tolerance. The talk will elaborate on these features via real use cases from early adopters like Hive, Pig and Cascading. The talk will highlight the recent developer release with examples of building and debugging Tez applications. Finally, we will provide data to show the robustness and performance of the Tez framework so that users can get on-board with confidence.
Bikas has been working in the Apache Hadoop ecosystem since 2011 and is a committer/PMC member of the Apache Hadoop and Tez projects. He is currently working on Apache Tez, a new framework to build high performance data processing applications natively on YARN. He has been a key contributor in making Hadoop run natively on Windows and has focused on YARN and the Hadoop compute stack. Prior to Hadoop, he has worked extensively on the Dryad distributed data processing framework that runs on some of the world's largest clusters as part of Microsoft Bing infrastructure. @bikassaha