Skip to content

Real World Hadoop : How the British Library Archived the Internet

Photo of Iwan Winoto
Hosted By
Iwan W.
Real World Hadoop : How the British Library Archived the Internet

Details

Presented by David Boloker (CTO, IBM Emerging Technologies) and Iwan Winoto (Software Architect, IBM Australia)

IBM's Emerging Internet Technologies team are called upon to deal with some of the biggest of the "big data" problems in the world. To tackle them effectively, they leverage both Hadoop as well as a suite of their own tools built on top of Hadoop.

Recently the team was engaged by the British Library to quite literally "download the web". Recent research estimates the average life expectancy of a Web site is just 44 – 75 days, meaning every six months, 10 percent of Web pages on the UK domain are lost. The challenge is to preserve the digital culture of the nation. IBM used their Hadoop based BigSheets project to help the British Library archive and analyse the UK web domain.

David is an IBM Distinguished Engineer and Chief Technical Officer for Emerging Internet Technologies in IBM Software Group. David is recognised in and outside IBM as a technical leader in the Internet software space guiding IBM's investments as well as internal product development.

Iwan is a Software Architect at IBM and represents the Emerging Internet Technologies team in Australia.

Photo of Hadoop User Group Sydney group
Hadoop User Group Sydney
See more events
IBM Centre
601 Pacific Highway, St Leonards NSW 206 · St Lenoards