Authority Control

Changes in SkylightUI

Changes to skylight to make the authority control more generic rather than
specific to dctitle

A more generic label for authority control can be specified using <<search URL>>.xml?field=<<label>>
which returns a list of records with (label match, record url) information.

For example, a search on http://redicdev.edina.ac.uk:8145/skylightui/search/microscope.xml?field=dctitle

would give the following results to feed into dspace (or any external entity) for authority control

<records>
<numhits>1</numhits>
<record>
<dctitle> Nanoscope I </dctitle>
<url> http://skylightURL/record/<<recordid>> </url>
</record
</records>

Changes in Dspace

Additional classes have been added to dspace/dspace-api/src/main/java/org/dspace/content/authority to
support the SkylightUI querying protocol as well as provide the authority control feature in dspace
Submission/Admin UI.

org/dsapce/content/authority/SkylightAPIProtocol class supports the querying of skylight UI for search
results. The results are parsed and then fed into the submission/admin UI for lookup/suggest.

org/dspace/content/authority/SkylightEquipmentTitle class defines the label on which the authority control
feature is supported and needs to be added to teh choice authority plugin in the dspace configuration.
(In this case, our equipment names are mapped onto dc.title, hence we are more focussed
on dctitle, but again this can be configured in any relevant fields)

Finally, the lookup/suggest feature can be configured in the dspace.cfg file as follows:

plugin.named.org.dspace.content.authority.ChoiceAuthority = \
org.dspace.content.authority.SkylightEquipmentTitle = ETitle

choices.plugin.dc.title = ETitle
choices.presentation.dc.title = suggest
authority.controlled.dc.title = true

Skylight UI

As part of the architecture, a webUI was to be provide for user access. The php based MVC application – Skylight (developed by the University of Auckland) was used in order to expose the equipment data. The application was modified to support equipment specific parameters. Extended customisations of SkylightUI include:

  • Use of .cerif extension to display the cerif-xml file of a particular ingested item. Support was added for the new ‘.cerif‘ extension which pointed back to the cerif-xml bitstream of the record. For example, the record: http://redicdev.edina.ac.uk:8145/skylightui/record/12 can also be delivered as http://redicdev.edina.ac.uk:8145/skylightui/record/12.cerif
  • Authority control based support was added which has been explained in the next section.
  • An export API has been added which supports extracting the necessary items in xml/csv format. The XML format would consolidate all the cerif-xmls from the selected records and provide an xml file, whereas the csv feature will grab the needed information from each selected record (title, description,type,cerif-xml url etc.)

The respective URL’s are as given below:

Curation Tasks

As part of the architecture, Dspace has been used as the data store for equipment items. The items are deposit into Dspace via sword based UI with minimal metadata pushed to a DC based dspace item. Therefore, curations tasks have been implemented for the purpose of reversible conversions between dspace items with base schema and cerif-xml. The ingested data is stored as bitstreams in the dspace item.

The curation system in Dspace provides a simple extensible way to perform routine operations on the repository contents (collections, communities or items). The main functions of the curations tasks are:

  • Conversion of equipment data items in base schema to cerif-xml which has been achieved by first validating the existing bitstream, extracting the bitstream consisting of the base schema equipment data, passing it through a crosswalk service to transform the data into cerif-xml, validating the cerif-xml and storing the generated cerif-xml data as another bitstream.
  • Conversion of equipment data items in cerif-xml to base schema. This is a lossy process since the initial mapping from base schema to cerif-xml was itself lossy as explained earlier.

The key factors influencing the design and implementation of the curations tasks are:

  • It was essential to create a XSL (transformation) file to transform base XML -> CERIF (and vice-versa) and then use the appropriate data to fill in the DC metadata values for the DSpace item. Therefore, the transformation happens independently using the Crosswalk service. This was very useful when updating the CERIF mapping as per the inputs provided by the CERIF-cordinator.
  • The curation tasks will have to run at the point of ingest (once the items are deposited into dspace). There are two options and which one is required depends on whether we want the task to run during installation (ie item doesn’t have a handle yet and the resource policies are still as for a workflow item) or just after installation. Since our curations tasks will have to modify/add the bitstreams to the dspace item, the curations tasks could not be configured to run during installation (Dspace yet does not support modification of bitstream information while the item is in the workflow).
  • The second approach was running the curation tasks after install. A customised dspace-api based event listener has been set up which listens for the install event. The event listener queues up all the incoming items at the ingest and the curation queue is executed every couple of minutes (via the cron job, currently which has been set to every 2 minutes). In this manner

Ingester Application

As part of the architecture, the manual deposit UI has been designed using Groovy/Grails MVC web framework mainly due to its ease of developing applications and extensibility. A SWORD based deposit interface has been implemented with the aim of ingestion of equipment metadata and CERIF-XML data. At the very core of the ingestion application, an EquipmentItem domain model maps onto the basic base schema. The views handle the input/display of the equipment information.The controllers carry over the task of taking the equipment metadata, mapping them onto the basic domain model and depositing the item into Dspace using SWORD based mechanism. The controllers have been designed in such a manner that, each incoming ingestion metadata in Southampton data format, University of Bath data format, EPSRC data format (xml or tsv) is mapped onto the base schema whereas the CERIF-XML equipment data is directly deposited into Dspace.

The corresponding URL’s showing the functionality are as given below:

CERIF Mapping

CERIF mapping

Base schema

An initial base schema was created ( primarily based on southampton equipment data) for mapping onto the CERIF equipment schema. EquipmentSchema.xsd

Mapping of EPSRC equipment data and University of Bath equipment records to the base schema

The EPSRC equipment schema was mapped onto the base schema as shown below:
Example record : http://equipment.epsrc.ac.uk/EquipmentDetails.aspx?EquipmentID=68029

Technique => name1
Organisation => orgUnitID
Department => faculty
Research Area => taxonomy
Equipment Details => description
Research Group Website => UrlPath
Full Name => contact1Name
Phone Number => contact1Phone
Email => contact1Email

The University of Bath equipment schema was mapped onto the base schema as shown below:
Example record : http://www.bath.ac.uk/person/501281

institution => orgUnitID
uid => uniqueID
equipmentTitle => name1
equipmentDescription => description
recordType => taxonomy
responsiblePerson => contact1Name
owner => faculty
phone => contact1Phone
website => contact1URL
email => contact1Email

Mapping of base schema to CERIF

An initial mapping of the Equipment/Facilities data into CERIF mainly from the Southampton equipment data, EPSRC equipment data, Univ of Bath data. The full file can is available here as a google spreadsheet

Individual comments/suggestions etc have been added for each element of the base schema.

 

Presentation made to EuroCRIS CERIF Task Group

The REDIC project made a brief presentation to the EuroCRIS CERIF task group meeting here in Edinburgh on the 17th July. The presentation was made to indicate development is underway in this space, and that we would like to feed observations made during the project work into the future development of the CERIF equipment/facility entities.