{"id":439,"date":"2023-11-01T12:03:03","date_gmt":"2023-11-01T12:03:03","guid":{"rendered":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/?p=439"},"modified":"2024-03-15T14:30:19","modified_gmt":"2024-03-15T14:30:19","slug":"graveyards-and-ghosts-in-web-archiving","status":"publish","type":"post","link":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/2023\/11\/01\/graveyards-and-ghosts-in-web-archiving\/","title":{"rendered":"Graveyards and ghosts in web archiving"},"content":{"rendered":"<p>October 1969 was a busy month. Monty Python\u2019s Flying Circus aired for the first time; Steve McQueen, Trey Parker and PJ Harvey were born; and on a dark, dark night (or about 10.30pm on the 29th), a 21-year-old UCLA student called Charley Kline started to transmit a message to the Stanford Research Institute using the Advanced Research Projects Agency Network. He meant to send the word \u2018LOGIN\u2019 \u2013 but the receiving system crashed at \u2018LO\u2019. And thus, the internet was born.<\/p>\n<p><!--more--><\/p>\n<p>Well, kind of. It would take another 20 years for Tim Berners-Lee\u2019s World Wide Web to be fully developed, but the 29th October marks the first day that an electronic message was sent over a network. Each year this date is commemorated as World Internet Day, and as time of year also sees us mark Halloween and <a href=\"https:\/\/www.dpconline.org\/events\/world-digital-preservation-day\/\" target=\"new\" rel=\"noopener\">World Digital Preservation Day<\/a> on November 2nd, there couldn\u2019t be a better time to share some of the web archiving work we\u2019ve been doing to preserve the University of Edinburgh\u2019s historical web content using tombstone pages. That\u2019s right \u2013 spooky internet preservation!<\/p>\n<p>There are lots of ghoulish metaphors that surround digital content. We talk about dead or dying platforms (there\u2019s a <a href=\"https:\/\/killedbygoogle.com\/\" target=\"new\" rel=\"noopener\">graveyard<\/a>, for example, of services that have been \u2018killed off\u2019 by Google); the term \u2018digital zombie\u2019 is used to describe someone unable to detach themselves from their online world at a cost to their IRL self; and there is a growing field of study exploring the concept of \u2018digital afterlives\u2019 &#8211; the potential for us to live on through our digital traces on social media long after we have died.<\/p>\n<p>Perhaps archivists are professionally predisposed to morbid thoughts about what will survive of us into the future \u2013 after all, all heritage work requires imagining a future when we are long gone and subsequent generations are trying to understand what happened, how, and why \u2013 and web archives are no different.<\/p>\n<h1>What is web archiving, and why are we doing it?<\/h1>\n<p>Web archiving is the process of creating reliable copies of web-based content for long term preservation. Information on the web disappears at a rapid pace as content is changed, updated, and removed, and once something has been removed there isn\u2019t always a reliable way of getting it back. How often have you clicked on a link only to reach a dreaded \u2018404\u2019 page? A 2015 study by the <a href=\"https:\/\/www.webarchive.org.uk\/\" target=\"new\" rel=\"noopener\">UK Web Archive<\/a> reported that in just two years, 40% of websites collected in the national web archive had disappeared.<\/p>\n<p>The University of Edinburgh has been publishing content on the internet since the <a href=\"https:\/\/web.archive.org\/web\/20230000000000*\/http:\/www.ed.ac.uk\/\" target=\"new\" rel=\"noopener\">late 1990s<\/a>, but we can\u2019t keep all of our content online for all time. There are lots of reasons why it\u2019s important that the University regularly audits its web estate. Firstly, it\u2019s just good digital housekeeping to regularly review what pages are currently online. Hosting a blog, for example, can take up a lot of space on our servers as we have to store each image, video, audio files etc and this storage all builds up.<\/p>\n<p>Secondly, there\u2019s an environmental impact to consider: storing and serving up images, files, and code necessary to render a website uses precious resources (did you know the internet in all its forms currently produces <a href=\"https:\/\/www.websitecarbon.com\/\" target=\"new\" rel=\"noopener\">the same level of carbon emissions as the aviation industry<\/a>?) Removing content that is no longer needed allows us to only host what we need and helps us towards reducing our carbon impact.<\/p>\n<p>Thirdly, we don\u2019t want outdated or incorrect information to remain on the live web. There\u2019s nothing more frustrating than thinking you\u2019ve finally found a page that answers your question \u2013 only to find out that it\u2019s years old and the information is out of date. We want our webpages to provide students, staff, and the wider community with the most accurate and up to date information about our activities as possible, so that means taking down things that are no longer applicable.<\/p>\n<p>But a lot of these older websites and webpages are a valuable record of the University\u2019s past activities, and we don\u2019t want to lose them completely! So how can we continue to provide access to older content without it taking up much needed resources and space on university\u2019s servers?<\/p>\n<h1>Here lies old content, gone but not forgotten<\/h1>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-442 alignright\" src=\"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/files\/2023\/10\/grave-252x300.png\" alt=\"A gravestone with the words 'here lies old content, GBNF' inscribed on it \" width=\"252\" height=\"300\" srcset=\"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/files\/2023\/10\/grave-252x300.png 252w, https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/files\/2023\/10\/grave.png 466w\" sizes=\"(max-width: 252px) 100vw, 252px\" \/><br \/>\nThis is where \u2018tombstoning\u2019 comes in. A tombstone is exactly what it sounds like: it\u2019s a way of marking the place where \u2018dead\u2019 content used to \u2018live\u2019. A tombstone page replaces the content and points to the archived copy instead.<\/p>\n<p>Tombstoning is a big part of the web archiving programme we\u2019ve been developing over the last few months, and you may have already spotted some tombstone pages on the University\u2019s website \u2013 like on the <a href=\"https:\/\/www.ed.ac.uk\/literatures-languages-cultures\/chb\" target=\"new\" rel=\"noopener\">Centre for the History of the Book<\/a>\u2019s former site or the <a href=\"https:\/\/www.ed.ac.uk\/literatures-languages-cultures\/news\" target=\"new\" rel=\"noopener\">Languages, Literature and Cultures news page<\/a> \u2013 and more will be coming in the next few months as we work to extend the web archiving programme to other areas of the web estate.<\/p>\n<h1>Ghosts in the machine<\/h1>\n<p>But what about the archived pages that we\u2019re actually linking to? Continuing our spo0o0o0oky metaphor, these archive captures are our ghosts. All archives are ghosts \u2013 an echo or impression left behind by past activity \u2013 but this is especially true of web archives. When we archive a page, we\u2019re not just taking a screenshot or saving the HTML. Instead, our goal is to make a copy of the site that will allow us to reproduce how the site functioned when it was live. In this way, web archives are ghosts in the machine that we can resurrect (recreate within a replayer) and through them, we can communicate with the past.<\/p>\n<p>Does that make a replayer an Ouija board? Can you be haunted by the ghosts of content past that you thought had been removed? Can this metaphor possibly be stretched any further?<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-441 aligncenter\" src=\"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/files\/2023\/10\/planchette-300x258.png\" alt=\"A planchette pointed at the address bar of a new browser tab\" width=\"300\" height=\"258\" srcset=\"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/files\/2023\/10\/planchette-300x258.png 300w, https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/files\/2023\/10\/planchette-349x300.png 349w, https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/files\/2023\/10\/planchette.png 433w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><i>**EDIT: The UK Web Archive has sadly been affected by the <a href=\"https:\/\/www.theguardian.com\/books\/2023\/oct\/31\/british-library-suffering-major-technology-outage-after-cyber-attack\" target=\"new\" rel=\"noopener\">massive tech outage suffered by the British Library<\/a>. Reports that this has been caused by ghouls and gremlins in the system have yet to be confirmed&#8230;**<\/i><\/p>\n","protected":false},"excerpt":{"rendered":"<p>October 1969 was a busy month. Monty Python\u2019s Flying Circus aired for the first time; Steve McQueen, Trey Parker and PJ Harvey were born; and on a dark, dark night (or about 10.30pm on the 29th), a 21-year-old UCLA student &hellip; <a href=\"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/2023\/11\/01\/graveyards-and-ghosts-in-web-archiving\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":206,"featured_media":441,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[107],"tags":[40,109,103,108],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/files\/2023\/10\/planchette.png","jetpack_shortlink":"https:\/\/wp.me\/p5j0gX-75","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/posts\/439"}],"collection":[{"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/users\/206"}],"replies":[{"embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/comments?post=439"}],"version-history":[{"count":8,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/posts\/439\/revisions"}],"predecessor-version":[{"id":452,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/posts\/439\/revisions\/452"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/media\/441"}],"wp:attachment":[{"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/media?parent=439"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/categories?post=439"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/libraryblogs.is.ed.ac.uk\/bitsandpieces\/wp-json\/wp\/v2\/tags?post=439"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}