addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgoogleimageimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Apache Tez: Accelerating Hadoop Data Processing

Message from Michal: As a member of the Hadoop community, Tez is a very important project to learn about and keep an eye on. In the world of MapReduce alternatives, Spark has grabbed many of the head lines and social media attention, but Tez is a solid technology with it's own distinct advantages. If you are designing a data pipeline in Hadoop, please join us as Bikas Saha from Hortonworks gives an overview of Tez. Below is a description of the presentation and also Bikas' biography.


Topic:

Apache Tez is a framework to create purpose-built data processing applications on YARN for Hadoop 2. Tez aims to provide high performance and efficiency out of the box, across the spectrum of low latency queries and heavy-weight batch processing. It provides a sophisticated topology API, advanced scheduling and concurrency control & proven fault tolerance. The talk will elaborate on these features via real use cases from early adopters like Hive, Pig and Cascading. The talk will highlight the recent developer release with examples of building and debugging Tez applications. Finally, we will provide data to show the robustness and performance of the Tez framework so that users can get on-board with confidence.

Speaker:

Bikas has been working in the Apache Hadoop ecosystem since 2011 and is a committer/PMC member of the Apache Hadoop and Tez projects. He is currently working on Apache Tez, a new framework to build high performance data processing applications natively on YARN. He has been a key contributor in making Hadoop run natively on Windows and has focused on YARN and the Hadoop compute stack. Prior to Hadoop, he has worked extensively on the Dryad distributed data processing framework that runs on some of the world's largest clusters as part of Microsoft Bing infrastructure. @bikassaha


Join or login to comment.

  • Bill S.

    Hello all, I wanted to make you aware of a meet up this Thursday that I thought you might be interested in at Hackreduce. Storm Users group: Streaming Data & Real -time Computation with Apache Storm. http://www.meetup.com/Boston-Storm-Users/events/195032642/?a=me1_grp&rv=me1

    September 22, 2014

  • Sean K.

    It was a good detailed Tez presentation. I appreciated the insights on how flexible Tez's DAG model was as well as its flexibility during runtime.

    September 17, 2014

  • Das S.

    Thanks Michal for press. Bikas you had valuable insights into Tez and Yarn - very much appreciated it

    September 17, 2014

  • A former member
    A former member

    Good introduction to Tez.

    September 17, 2014

  • Michal K.

    Hi Everyone, I've posted Bikas' presentation in the files section. Here is the link: http://files.meetup.com/1535756/Tez-Boston-HUG-2014.pptx

    September 17, 2014

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy