addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Real World Hadoop : How the British Library Archived the Internet

Presented by David Boloker (CTO, IBM Emerging Technologies) and Iwan Winoto (Software Architect, IBM Australia)

IBM's Emerging Internet Technologies team are called upon to deal with some of the biggest of the "big data" problems in the world. To tackle them effectively, they leverage both Hadoop as well as a suite of their own tools built on top of Hadoop.

Recently the team was engaged by the British Library to quite literally "download the web". Recent research estimates the average life expectancy of a Web site is just 44 – 75 days, meaning every six months, 10 percent of Web pages on the UK domain are lost. The challenge is to preserve the digital culture of the nation. IBM used their Hadoop based BigSheets project to help the British Library archive and analyse the UK web domain.

David is an IBM Distinguished Engineer and Chief Technical Officer for Emerging Internet Technologies in IBM Software Group. David is recognised in and outside IBM as a technical leader in the Internet software space guiding IBM's investments as well as internal product development.

Iwan is a Software Architect at IBM and represents the Emerging Internet Technologies team in Australia.


Join or login to comment.

  • Ben L.

    Great presentation - was hoping for something a little more technical.

    March 16, 2011

  • A former member
    A former member

    good to see British Library using Hadoop instead of attacking IT like many of my colleagues do - I am hitech and librarian

    March 5, 2011

19 went

Our Sponsors

  • Cloudera

    Generously providing speakers, drinks, and discounted Hadoop training.

  • Google Sydney

    Generously providing meeting space, food and speakers.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy