Graveyards and ghosts in web archiving

October 1969 was a busy month. Monty Python’s Flying Circus aired for the first time; Steve McQueen, Trey Parker and PJ Harvey were born; and on a dark, dark night (or about 10.30pm on the 29th), a 21-year-old UCLA student called Charley Kline started to transmit a message to the Stanford Research Institute using the Advanced Research Projects Agency Network. He meant to send the word ‘LOGIN’ – but the receiving system crashed at ‘LO’. And thus, the internet was born.

Well, kind of. It would take another 20 years for Tim Berners-Lee’s World Wide Web to be fully developed, but the 29th October marks the first day that an electronic message was sent over a network. Each year this date is commemorated as World Internet Day, and as time of year also sees us mark Halloween and World Digital Preservation Day on November 2nd, there couldn’t be a better time to share some of the web archiving work we’ve been doing to preserve the University of Edinburgh’s historical web content using tombstone pages. That’s right – spooky internet preservation!

There are lots of ghoulish metaphors that surround digital content. We talk about dead or dying platforms (there’s a graveyard, for example, of services that have been ‘killed off’ by Google); the term ‘digital zombie’ is used to describe someone unable to detach themselves from their online world at a cost to their IRL self; and there is a growing field of study exploring the concept of ‘digital afterlives’ – the potential for us to live on through our digital traces on social media long after we have died.

Perhaps archivists are professionally predisposed to morbid thoughts about what will survive of us into the future – after all, all heritage work requires imagining a future when we are long gone and subsequent generations are trying to understand what happened, how, and why – and web archives are no different.

What is web archiving, and why are we doing it?

Web archiving is the process of creating reliable copies of web-based content for long term preservation. Information on the web disappears at a rapid pace as content is changed, updated, and removed, and once something has been removed there isn’t always a reliable way of getting it back. How often have you clicked on a link only to reach a dreaded ‘404’ page? A 2015 study by the UK Web Archive reported that in just two years, 40% of websites collected in the national web archive had disappeared.

The University of Edinburgh has been publishing content on the internet since the late 1990s, but we can’t keep all of our content online for all time. There are lots of reasons why it’s important that the University regularly audits its web estate. Firstly, it’s just good digital housekeeping to regularly review what pages are currently online. Hosting a blog, for example, can take up a lot of space on our servers as we have to store each image, video, audio files etc and this storage all builds up.

Secondly, there’s an environmental impact to consider: storing and serving up images, files, and code necessary to render a website uses precious resources (did you know the internet in all its forms currently produces the same level of carbon emissions as the aviation industry?) Removing content that is no longer needed allows us to only host what we need and helps us towards reducing our carbon impact.

Thirdly, we don’t want outdated or incorrect information to remain on the live web. There’s nothing more frustrating than thinking you’ve finally found a page that answers your question – only to find out that it’s years old and the information is out of date. We want our webpages to provide students, staff, and the wider community with the most accurate and up to date information about our activities as possible, so that means taking down things that are no longer applicable.

But a lot of these older websites and webpages are a valuable record of the University’s past activities, and we don’t want to lose them completely! So how can we continue to provide access to older content without it taking up much needed resources and space on university’s servers?

Here lies old content, gone but not forgotten

A gravestone with the words 'here lies old content, GBNF' inscribed on it
This is where ‘tombstoning’ comes in. A tombstone is exactly what it sounds like: it’s a way of marking the place where ‘dead’ content used to ‘live’. A tombstone page replaces the content and points to the archived copy instead.

Tombstoning is a big part of the web archiving programme we’ve been developing over the last few months, and you may have already spotted some tombstone pages on the University’s website – like on the Centre for the History of the Book’s former site or the Languages, Literature and Cultures news page – and more will be coming in the next few months as we work to extend the web archiving programme to other areas of the web estate.

Ghosts in the machine

But what about the archived pages that we’re actually linking to? Continuing our spo0o0o0oky metaphor, these archive captures are our ghosts. All archives are ghosts – an echo or impression left behind by past activity – but this is especially true of web archives. When we archive a page, we’re not just taking a screenshot or saving the HTML. Instead, our goal is to make a copy of the site that will allow us to reproduce how the site functioned when it was live. In this way, web archives are ghosts in the machine that we can resurrect (recreate within a replayer) and through them, we can communicate with the past.

Does that make a replayer an Ouija board? Can you be haunted by the ghosts of content past that you thought had been removed? Can this metaphor possibly be stretched any further?

A planchette pointed at the address bar of a new browser tab

**EDIT: The UK Web Archive has sadly been affected by the massive tech outage suffered by the British Library. Reports that this has been caused by ghouls and gremlins in the system have yet to be confirmed…**

Things I’ve learnt from working with the BITS Magazine by Digi Pres Intern Jasmine Patel

Re-blogged from Information Services Group: Student Employee Blog

Programme of Study and Year: 1st Year German

Intern Position: Digital Preservation Intern

Hobbies: Piano, a sprinkling of violin (and soon flute!), running and tea

Student intern holding printed BITS magazines

“Don’t assume you’ll be able to read your email if you go to the States” and other things I’ve learnt from working with the BITS Magazine this summer.

Continue reading

The University of Edinburgh Becomes the First UK University to Appoint a Web Archivist to Preserve 21st Century History

Heritage Collections welcomes a new member of the digital archive team, Alice Austin, the first dedicated Web Archivist at the University of Edinburgh and the first dedicated Web Archivist at a non-legal deposit UK University!

Read more -> https://blogs.ed.ac.uk/website-communications/the-university-of-edinburgh-becomes-the-first-uk-university-to-appoint-a-web-archivist-to-preserve-21st-century-history/ 

Minimum Preservation for Maximum Results? It’s a good idea if it works!

iPres2022 logoIn less than a week, iPres22 will kick off in Glasgow, right on our doorstep! If you’re not already acquainted with iPres, welcome! There’s something for everyone! iPres is the world’s largest digital preservation conference where practitioners from all sorts of backgrounds and industries gather to share challenges and strategies.  

At this year’s iPres, among other things, I’m running a workshop with Caylin Smith (Head of Digital Preservation at Cambridge University Library) and Patricia Falcao (Time-based Media Conservator at Tate) on Preserving Complex Digital Objects – Revisited. We ran a similar workshop at iPres 2019 in Amsterdam, breaking into groups to undertake different aspects of the preservation process with one complex digital object (‘Breathe’ by Kate Pullinger). 

This year, each of the speakers will bring a digital object from their own collections that they consider to be complex. We will break participants into small groups to try an experiment: 

Can a Minimum Viable Preservation (MVP) approach be applied to complex digital objects? Read more to learn about what MVP digital preservation looks like and what to expect from the workshop! If you’re planning to attend iPres, come join us! 

Continue reading

iPres Delegate Visit to the University of Edinburgh

DX7 Synthesizer Keyboard

As part of iPres 2022 hosted in Glasgow on 12-16 September, the University of Edinburgh will welcome a cohort of conference delegates to tour the Main Library and St. Cecilia’s Hall.

As part of the tour, specialist curators will provide an overview of materials from across collections that reflect many examples of Technology Heritage in the care of the University. Items will include manuscripts like Makhrūṭāṭ Iblawniyūs (Apollonius’ Cones), an early 18th century copy of a Codex Arabic Script and Godfrey Thomson’s mechanical calculator from the 1930s. Continue reading

Under Construction

Image: Digital Preservation Business Case Toolkit http://wiki.dpconline.org/

Nothing to see here! Check back soon for updates about digital preservation at the University of Edinburgh!