Past Meetup

Real World Hadoop : How the British Library Archived the Internet

This Meetup is past

19 people went

IBM Centre

601 Pacific Highway, St Leonards NSW 206 · St Lenoards

How to find us

Venue is TBC, but likely to be IBM's offices in St Lenoards

Location image of event venue


Presented by David Boloker (CTO, IBM Emerging Technologies) and Iwan Winoto (Software Architect, IBM Australia)

IBM's Emerging Internet Technologies team are called upon to deal with some of the biggest of the "big data" problems in the world. To tackle them effectively, they leverage both Hadoop as well as a suite of their own tools built on top of Hadoop.

Recently the team was engaged by the British Library to quite literally "download the web". Recent research estimates the average life expectancy of a Web site is just 44 – 75 days, meaning every six months, 10 percent of Web pages on the UK domain are lost. The challenge is to preserve the digital culture of the nation. IBM used their Hadoop based BigSheets project to help the British Library archive and analyse the UK web domain.

David is an IBM Distinguished Engineer and Chief Technical Officer for Emerging Internet Technologies in IBM Software Group. David is recognised in and outside IBM as a technical leader in the Internet software space guiding IBM's investments as well as internal product development.

Iwan is a Software Architect at IBM and represents the Emerging Internet Technologies team in Australia.