Minimum Preservation for Maximum Results? It’s a good idea if it works!

iPres2022 logo In less than a week, iPres22 will kick off in Glasgow, right on our doorstep! If you’re not already acquainted with iPres, welcome! There’s something for everyone! iPres is the world’s largest digital preservation conference where practitioners from all sorts of backgrounds and industries gather to share challenges and strategies.

At this year’s iPres, among other things, I’m running a workshop with Caylin Smith (Head of Digital Preservation at Cambridge University Library) and Patricia Falcao (Time-based Media Conservator at Tate) on Preserving Complex Digital Objects – Revisited. We ran a similar workshop at iPres 2019 in Amsterdam, breaking into groups to undertake different aspects of the preservation process with one complex digital object (‘Breathe’ by Kate Pullinger).

This year, each of the speakers will bring a digital object from their own collections that they consider to be complex. We will break participants into small groups to try an experiment:

Can a Minimum Viable Preservation (MVP) approach be applied to complex digital objects? Read more to learn about what MVP digital preservation looks like and what to expect from the workshop! If you’re planning to attend iPres, come join us!

Applying a Minimum Viable Preservation (MVP) Approach to Complex Digital Objects

This Workshop defines MVP (Minimum Viable Preservation) as an approach to digital preservation that takes the minimum steps necessary to ensure access to a digital object for one generation (IT system generation). Other terms floating around for this concept (more or less) include minimal effort ingest, parsimonious preservation, and risk-based preservation. Rather than apply sophisticated frameworks or standards to every digital resource, is it possible to assess precise risks and implement solutions that mitigate those risks only and not imagined risks for which we have no or very weak evidence?

Let’s find out.

The approach outlined here is derived from a few different sources and ideas about how to streamline digital preservation and simplify the systems and processes we use.

The earliest concept of minimum, or ‘parsimonious’ preservation comes from Tim Gollins in 2009 where he advocates for using as much pre-existing infrastructure as possible rather than a complex, bespoke system. According to Gollins, parsimonious preservation focuses on minimum access provision (rather than trying to achieve widespread, remote access) and initial retention for roughly 5-10 years (or the lifespan of a typical IT system). He argues that parsimonious preservation should not be conflated with ‘cheap’ preservation but rather understood as practical approaches that emphasise good capture practice and early intervention. However, Gollins proposes that this approach can only be applied to smaller, more heterogeneous collections. His main evidence to support his argument that software obsolescence may not pose a major risk, for example, is restricted to a file-level preservation approach.

Nine years later in October 2018, Paul Wheatley of the DPC argues for simplified ingest to digital preservation systems by reducing the role of file format validation. He argues that attempting to fix ‘failed’ files can lead to unintentional loss of meaning or functionality, but more importantly, takes up a huge amount of time for very little return. While he concedes that file format validation does have a role to play, it’s most effective when improving target formats that are more preservable in the first place. Rather bleakly, he argues complex format specifications are unlikely to be detailed enough to support backwards engineering of a tool that can render those files anyway. He advocates instead for developing rendering tools that can auto-parse files to discover problems and report on them so risks can be evaluated by preservationists rather than just assumed to be problematic. Wheatley’s proposal is speculative, though, and calls for further research into the efficacy of such auto-rendering tools. Also, similarly to Gollins’, Wheatley’s proposal is focussed on file-level preservation.

In response to Wheatley’s takedown of the digital preservation mainstay of format validation, in November 2018, Matthew Addis speculates whether the digital preservation community could establish a universal (or widely applicable) Minimum Viable Preservation approach to manage the bulk of digital data more efficiently. (He also coined the term we have adopted for this workshop, with thanks.) Like Gollins’ and Wheatley’s arguments, Addis’s approach is very file-centric, proposing shared workflows (via COW) and automated preservation actions (via PAR) that can be applied at the file level. Addis proposes that an MVP approach can support the majority of content to free up capacity for curators to build more sophisticated preservation approaches for more complex content. He goes on to state:

‘That’s not to say that other content types won’t be harder or need more steps, especially complex digital objects where an authentic user experience is a key part of preservation […] But maybe there is a place for a simple MVP approach to complex objects too – if it’s not easy to find or keep or run a full-on renderer, then simply recording “a previous rendering of” the content (video, screencams, webrecorder.io etc.) could be an MVP approach to take.’

In this workshop we take up Matthew Addis’ challenge that there may be a way to take a simpler, minimum approach to preserving more complex objects as well as more straightforward files (e.g. PDFs).

Given the limitations of the MVP or parsimonious preservation approaches, can a similar process be applied to more complex digital objects, like data-driven applications or virtual reality artworks? Will these types of approaches only work effectively for file-level preservation? Can a simpler, risk analysis-based approach capture the meaningful experience or responsive interactions so fundamental to many of the complex objects we describe in this workshop?

From a practical perspective, if widespread remote access is not the goal, will it be (even) more difficult to make the case for resource and funding? The type of early intervention, high quality capture required to fulfil parsimonious preservation depends on skilled staffing capacity, which is out of reach for many organisations. Similarly, if an institutional strategy aims only to preserve content for only one IT system lifetime, will the resulting on-going, iterative preservation become unsustainable as content grows?

These are just some questions to consider while undertaking this activity. In the next hour or so, we ask you to break into groups and analyse one of three collection items presented today. With the available documentation, assess if an MVP (or adapted MVP) approach might adequately preserve that complex digital object.

Questions to guide the activity:

What is it? – Determine the basic components and decide what the object is
Why are you preserving this object? – Define the motivation and purpose for preserving this object, given the available information about the object and its institutional context
Why was it created and who will use it in the future – Understand, to the best of your ability, what purpose the object is meant to serve, as envisioned by its creators, and who is likely to use it in the future
What aspects of the object matter to, or support, the purpose of preservation? – Decide what components or information is required to enable future use of this object by the identified future users, including dependencies, permissions, and related objects
What can you learn about the risks to this type of object? – Use the available documentation and your own quick, online research to learn what you can about this object or even possible preservation strategies
What further information is required to understand this object – Articulate what information, if any, is missing that might prevent you from making a preservation decision
What preservation actions are required to preserve this object? – Decide what action(s) you would take to enable access to this object for approximately 10 years (or one IT system generation)
How will you provide access to this object? – Determine what metadata and systems/hardware are required to enable access to the preserved object in future
What steps might be taken to ensure longer-term access? – Consider what additional action or future processes would be required to ensure access to this object beyond the lifetime of a single IT system, even for perpetuity
Does this constitute a ‘minimum viable preservation’ approach? – Decide within your group whether you consider the required steps to preserve this object to align with the principles suggested about MVP. Can you preserve this without reliance on file format validation, for example? Could all or part of this process be applied to other similar objects in different types of institutions? Will bespoke tools or infrastructure be required to preserve this object or can well-supported, existing tools and infrastructure be used?
If not, why not?

Sources

Matthew Addis, ‘Minimum Viable Preservation’, DPC Blog (Nov 2018): https://www.dpconline.org/blog/minimum-viable-preservation

Tim Gollins, ‘Parsimonious preservation: preventing pointless processes! (The small simple steps that take digital preservation a long way forward)’, Online Information (2009): https://cdn.nationalarchives.gov.uk/documents/information-management/parsimonious-preservation.pdf

Paul Wheatley, ‘A valediction for validation?’, DPC Blog (October 2018): https://www.dpconline.org/blog/a-valediction-for-validation

Bits and Pieces

Digital Preservation at the University of Edinburgh

Minimum Preservation for Maximum Results? It’s a good idea if it works!