British Library to archive billions of webpages

London, April 6 (IANS) Embarking on the biggest expansion of its archiving power since 17th century, the British Library intends to record the country’s burgeoning collection of online cultural and intellectual works.

It aims to “harvest” the entire UK web domain to document current events. It is estimated around a billion pages a year will now be amassed along with the books, magazines and newspapers which have been stored for several centuries every year, Daily Express reported Friday.

According to Lucie Burgess, leading the project at the British Library, the unprecedented operation would provide a complete snapshot of life in the 21st century which increasingly plays out online.

“We have already lost a lot of material, particularly around events such as the 7/7 London bombings or the 2008 financial crisis.

“That material has fallen into the digital black hole of the 21st century because we haven’t been able to capture it… Most of that material has already been lost or taken down. The social media reaction has gone,” she said.

The operation to “capture the digital universe” will begin with an automatic “web harvest” of an initial 4.8 million websites – or one billion webpages – from the UK domain, she said.

The library, having invested three million pounds in the project, plans to collect the material by conducting an “annual trawl” of the UK web domain.

It will “harvest” information from another 200 sites — such as online newspapers or journals — on a more regular basis.

Access to the material, including archived websites, will be offered in reading rooms at each of the legal deposit libraries.

Leave a comment

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.