Archivematica – you’re the one that I want!

Well, after a period of extended leave I’m back online again and have a new series of practical and theoretical posts planned for 2016 that will keep you up to speed with our digital preservation project.

To kick start this I’d like to honour my promise to give a review of Archivematica, as discussed in my previous post.

Our assessment began with the installation of the software (which at the time was version 1.3), freely downloadable from Archivematica‘s website, onto a virtual machine, currently sitting under a colleague’s desk within the Library. By all accounts the installation process was relatively straightforward, though did require acclimating to Ubuntu (an OS that is not presently supported or used by our University). That said once the necessary work had been complete and a Secure Shell Transfer client had been set up on my local PC (the only method to transfer content into Archivematica given I didn’t have direct access to the VM) I was off…

I carried out 25 individual tests to assess how it handled different types of content of varying sizes. My aim with this was:

  1. To understand Archivematica’s workflow
  2. To understand the functionality within the software
  3. To benchmark the software against our established requirements
  4. To identify options for processing content, and optimum settings to maximise the software, to facilitate development of internal workflows across the lifecycle of content

Result: On the whole Archivematica performed very well. As the software is a suite of connected microservices it has a very sequential approach to processing ingested collections and its layout is designed to mirror, more or less, the stages within the OAIS model (Ingest, Storage, Preservation planning). This is quite comforting to a novice or intermediate digital archivist as it grounds the function of the system against the standard to which we all wish to attain. However when it comes to processing content it can seem a little overwhelming. Archivematica has two processing stages (Transfer and Ingest) before content is packaged in a bagit bag and wrapped in METS xml and sent off to a repository/storage area of your choice.  Content is processed through microservices within each of these stages, listing each in a manner somewhat like the ‘Grandstand’ teleprinter (for those that remember that show!)  and then the software identifies whether that microservice was successfully carried out or not. At first it can be a little confusing as to what each of the microservices are actually doing as the wording can sometimes be a little vague. That said once you’ve practiced and experimented a few times, and consulted the helpful online guidance on Artefactual’s wiki, it soon all becomes crystal clear.

In terms of how it performed against our requirements I can say that out of 48 essential criteria it fully satisfied 25 and partially (as yet) satisfied a further 13. The remaining 10 criteria were largely related to access, integrity checking, full text searching, AIP management and reporting/logs – more on that later.

In terms of capturing sufficient PREMIS metadata it captured 76% of what I considered essential object metadata, 100% essential event metadata and 67% essential agent metadata (I’ve omitted rights metadata as I anticipate this will be handled by our access system).

There are a few limitations to the software in its present version. We currently use ArchivesSpace as the user discovery tool to our archive collections and would want our digital collections to be accessible through the same portal. At present Archivematica doesn’t support this, however I am aware of two upcoming releases that should go someway to plugging that gap. Another shortfall is the integrity checking issue. This can be a bit of a bugbear as anyone who has tried to checksum check a collection will know that depending on the size and volume of a collection it can take anything from 1 minute to 1 day to process (and sometimes longer!). At present the integrity checking issue (which I think is pretty fundamental if we are to rubber stamp a digital object as being trustworthy) is satisfied for Archivematica processed content via two plugins (Fixity and Binder). I can’t yet vouch for how they work or how simple to use or effective they are as I’ve yet to have then installed and tested but I know others can (I think that warrants a future blog post). For us another requirement is to be able to conduct full text searching within text based digital objects. This is largely connected to the access issue and one that ArchivesSpace doesn’t satisfy either. We are exploring how we can incorporate Dspace into the workflow for this. Given that the process of preserving digital objects is more than just ingesting a digital collection. There needs to be some ability to manage those collections on an ongoing basis to ensure they remain accessible and readable. At present Archivematica doesn’t provide the ability to reingest AIPs into the system to enable further migration, and amendment to the preservation metadata. I believe some of this area has been funded by the Zuse Institute, relating to adding DC metadata to an already packaged AIP but the full AIP reingest is awaiting funding. One final issue is the operating system the software is designed to sit on, Ubuntu. As mentioned above our IT infrastructure team supports systems running on CentOS or RedHat. We, as far as I’m aware, have no systems running on Ubuntu and it makes little practical sense to factor this into our daily routines. Therefore for us to be able to use Archivematica it would need to run on CentOS to enable it to be fully supported and updated by our IT team. There is a relatively inexpensive fix for this, which we’ve discussed with Artefactual, so hope to have a solution in place soon.

Despite the few shortfalls there are a number of opportunites that this free and open source system can offer. It seems that there is quite a buzz about it within the HE sector. A number of US institutions (Bentley Historical Library and Rockefellar to name two) are using the software and working, either together or with Artefactual, to enhance its capabilities. New releases of the system are quite frequent and Artefactual’s development roadmap promises some exciting new functionality in the months to come. Here within the UK we now have a UK Archivematica User Group (set up by Jen Mitcham at the University of York) which has a growing list of HE and heritage sector members. Archivematica now features quite prominently in JISC’s Research Data Spring with Universities of York and Hull collaborating to assess the systems capability and opportunity to manage and preserve research data (Filling the Preservation Gap). And there are integrations, or the potential for integrations, with storage through Arkivum and connectivity to other platforms such as Dspace and Fedora.

There may be a bit of work involved in using and developing this, which I’m sure is true of any system, but given the rising tide of interest and rate of uptake I think it would be wise to get on the ‘bandwagon’, so to speak, and take part in the ride! Here’s to Archivematica…and excuse the clichés!


4 Responses to “Archivematica – you’re the one that I want!”

  1. I have a good read. I am highly recommending your blog to my connections.

    Posted by Gary | June 13, 2016, 3:36 pm
  2. Hi Kristy,

    I’m from the Zuse Institute Berlin and wanted to point you to the full reingest feature we’ve funded:

    Adding and deleting digital Objects from AIPs will in our case always require producing a new AIP. But I’m interested in other use cases where changing content in an AIP might be useful.

    Posted by Marco Klindt | January 20, 2016, 9:29 am
    • Thanks for getting in touch Marco. That’s great to hear. I see that this work is coming out in the 1.6 release. Looks like this is going to be an exciting release with all the work Artefactual have done with the Bentley Historical Society on Appraisal/Arrangement and ArchivesSpace integration. I’d be keen to offer any experiences (most likely in a test environment) of the AIP reingest functionality.

      Posted by Kirsty | January 20, 2016, 9:38 am
  3. Hi Kristy,

    Thanks for this evaluation of Archivematica! We’re always excited to see thorough testing and evaluation of the tool. We’d love to see all your data if you are able to share- maybe there are improvements that we can make to bump some of those requirement coverage percentages higher.

    Posted by Sarah Romkey, Archivematica Program Manager | January 14, 2016, 5:07 pm

Post a Comment

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 17 other subscribers

Follow me on Twitter