Curation Tasks

As part of the architecture, Dspace has been used as the data store for equipment items. The items are deposit into Dspace via sword based UI with minimal metadata pushed to a DC based dspace item. Therefore, curations tasks have been implemented for the purpose of reversible conversions between dspace items with base schema and cerif-xml. The ingested data is stored as bitstreams in the dspace item.

The curation system in Dspace provides a simple extensible way to perform routine operations on the repository contents (collections, communities or items). The main functions of the curations tasks are:

  • Conversion of equipment data items in base schema to cerif-xml which has been achieved by first validating the existing bitstream, extracting the bitstream consisting of the base schema equipment data, passing it through a crosswalk service to transform the data into cerif-xml, validating the cerif-xml and storing the generated cerif-xml data as another bitstream.
  • Conversion of equipment data items in cerif-xml to base schema. This is a lossy process since the initial mapping from base schema to cerif-xml was itself lossy as explained earlier.

The key factors influencing the design and implementation of the curations tasks are:

  • It was essential to create a XSL (transformation) file to transform base XML -> CERIF (and vice-versa) and then use the appropriate data to fill in the DC metadata values for the DSpace item. Therefore, the transformation happens independently using the Crosswalk service. This was very useful when updating the CERIF mapping as per the inputs provided by the CERIF-cordinator.
  • The curation tasks will have to run at the point of ingest (once the items are deposited into dspace). There are two options and which one is required depends on whether we want the task to run during installation (ie item doesn’t have a handle yet and the resource policies are still as for a workflow item) or just after installation. Since our curations tasks will have to modify/add the bitstreams to the dspace item, the curations tasks could not be configured to run during installation (Dspace yet does not support modification of bitstream information while the item is in the workflow).
  • The second approach was running the curation tasks after install. A customised dspace-api based event listener has been set up which listens for the install event. The event listener queues up all the incoming items at the ingest and the curation queue is executed every couple of minutes (via the cron job, currently which has been set to every 2 minutes). In this manner