Meetup #3

It’s Two O’clock in the Morning: Do You Know Where Your Petabytes Are?
Robert Chansler, Engineering Manager, Grid Computing, Yahoo!

The Hadoop Distributed File System at Yahoo! stores 25 petabytes across 25 thousand nodes. To be a good custodian of this much data requires continuous surveillance and management to ensure the integrity and durability of the data. Importantly, the most conventional strategy for data protection—just make a copy somewhere else—is not practical for such large data sets. HDFS must continuously manage the number of replicas for each block, test the integrity of blocks, balance the usage of resources as the hardware infrastructure changes, report status to administrators, and be on guard for the unexpected.

SPEAKER BIO: Rob Chansler is a veteran of the Cm* and C.mmp projects at CMU. After finishing graduate studies, he made compilers in Pittsburgh at Tartan Labs. At Adobe Systems Rob joined the core Postscript group to develop products for high-end and specialty systems. Rob joined the dot-com world just in time to experience the crash before moving to McDATA to do management software for storage area networks. Now at Yahoo, Rob manages development for the Hadoop Distributed File System.

Join or login to comment.

  • Doug B.

    Nice sized crowd.

    May 26, 2010

  • A former member
    A former member

    I felt the presentation was a bit brief.

    May 25, 2010

4 went

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more
Rafaël

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy