All the REDIC code produced during the project is now available via Github at https://github.com/uoe-edina/REDIC
Screencast of REDIC implementation
As part of the completion work on the REDIC project we’ve put together the following screencast as a means to try and explain how the various elements of the system described in the previous posts work together.
Authority Control
Changes in SkylightUI
Changes to skylight to make the authority control more generic rather than
specific to dctitle
A more generic label for authority control can be specified using <<search URL>>.xml?field=<<label>>
which returns a list of records with (label match, record url) information.
For example, a search on http://redicdev.edina.ac.uk:8145/skylightui/search/microscope.xml?field=dctitle
would give the following results to feed into dspace (or any external entity) for authority control
<records>
<numhits>1</numhits>
<record>
<dctitle> Nanoscope I </dctitle>
<url> http://skylightURL/record/<<recordid>> </url>
</record
</records>
Changes in Dspace
Additional classes have been added to dspace/dspace-api/src/main/java/org/dspace/content/authority to
support the SkylightUI querying protocol as well as provide the authority control feature in dspace
Submission/Admin UI.
org/dsapce/content/authority/SkylightAPIProtocol class supports the querying of skylight UI for search
results. The results are parsed and then fed into the submission/admin UI for lookup/suggest.
org/dspace/content/authority/SkylightEquipmentTitle class defines the label on which the authority control
feature is supported and needs to be added to teh choice authority plugin in the dspace configuration.
(In this case, our equipment names are mapped onto dc.title, hence we are more focussed
on dctitle, but again this can be configured in any relevant fields)
Finally, the lookup/suggest feature can be configured in the dspace.cfg file as follows:
plugin.named.org.dspace.content.authority.ChoiceAuthority = \
org.dspace.content.authority.SkylightEquipmentTitle = ETitle
choices.plugin.dc.title = ETitle
choices.presentation.dc.title = suggest
authority.controlled.dc.title = true
Skylight UI
As part of the architecture, a webUI was to be provide for user access. The php based MVC application – Skylight (developed by the University of Auckland) was used in order to expose the equipment data. The application was modified to support equipment specific parameters. Extended customisations of SkylightUI include:
- Use of .cerif extension to display the cerif-xml file of a particular ingested item. Support was added for the new ‘.cerif‘ extension which pointed back to the cerif-xml bitstream of the record. For example, the record: http://redicdev.edina.ac.uk:8145/skylightui/record/12 can also be delivered as http://redicdev.edina.ac.uk:8145/skylightui/record/12.cerif
- Authority control based support was added which has been explained in the next section.
- An export API has been added which supports extracting the necessary items in xml/csv format. The XML format would consolidate all the cerif-xmls from the selected records and provide an xml file, whereas the csv feature will grab the needed information from each selected record (title, description,type,cerif-xml url etc.)
The respective URL’s are as given below:
- Homepage => http://redicdev.edina.ac.uk:8145/skylightui/
- A sample search => http://redicdev.edina.ac.uk:8145/skylightui/search/microscope
- A sample record => http://redicdev.edina.ac.uk:8145/skylightui/record/12
Curation Tasks
As part of the architecture, Dspace has been used as the data store for equipment items. The items are deposit into Dspace via sword based UI with minimal metadata pushed to a DC based dspace item. Therefore, curations tasks have been implemented for the purpose of reversible conversions between dspace items with base schema and cerif-xml. The ingested data is stored as bitstreams in the dspace item.
The curation system in Dspace provides a simple extensible way to perform routine operations on the repository contents (collections, communities or items). The main functions of the curations tasks are:
- Conversion of equipment data items in base schema to cerif-xml which has been achieved by first validating the existing bitstream, extracting the bitstream consisting of the base schema equipment data, passing it through a crosswalk service to transform the data into cerif-xml, validating the cerif-xml and storing the generated cerif-xml data as another bitstream.
- Conversion of equipment data items in cerif-xml to base schema. This is a lossy process since the initial mapping from base schema to cerif-xml was itself lossy as explained earlier.
The key factors influencing the design and implementation of the curations tasks are:
- It was essential to create a XSL (transformation) file to transform base XML -> CERIF (and vice-versa) and then use the appropriate data to fill in the DC metadata values for the DSpace item. Therefore, the transformation happens independently using the Crosswalk service. This was very useful when updating the CERIF mapping as per the inputs provided by the CERIF-cordinator.
- The curation tasks will have to run at the point of ingest (once the items are deposited into dspace). There are two options and which one is required depends on whether we want the task to run during installation (ie item doesn’t have a handle yet and the resource policies are still as for a workflow item) or just after installation. Since our curations tasks will have to modify/add the bitstreams to the dspace item, the curations tasks could not be configured to run during installation (Dspace yet does not support modification of bitstream information while the item is in the workflow).
- The second approach was running the curation tasks after install. A customised dspace-api based event listener has been set up which listens for the install event. The event listener queues up all the incoming items at the ingest and the curation queue is executed every couple of minutes (via the cron job, currently which has been set to every 2 minutes). In this manner
Ingester Application
As part of the architecture, the manual deposit UI has been designed using Groovy/Grails MVC web framework mainly due to its ease of developing applications and extensibility. A SWORD based deposit interface has been implemented with the aim of ingestion of equipment metadata and CERIF-XML data. At the very core of the ingestion application, an EquipmentItem domain model maps onto the basic base schema. The views handle the input/display of the equipment information.The controllers carry over the task of taking the equipment metadata, mapping them onto the basic domain model and depositing the item into Dspace using SWORD based mechanism. The controllers have been designed in such a manner that, each incoming ingestion metadata in Southampton data format, University of Bath data format, EPSRC data format (xml or tsv) is mapped onto the base schema whereas the CERIF-XML equipment data is directly deposited into Dspace.
The corresponding URL’s showing the functionality are as given below:
- Uploading equipment data/ Deleting equipment data => http://redicdev.edina.ac.uk:11000/equipmentManager/index
- List of items in the Ingester (synced or not synced to Dspace) => http://redicdev.edina.ac.uk:11000/equipmentItem/list
- Adding a new equipment item => http://redicdev.edina.ac.uk:11000/equipmentItem/create
- Corresponding items in Dspace => http://redicdev.edina.ac.uk:8081/xmlui/
CERIF Mapping
CERIF mapping
Base schema¶
An initial base schema was created ( primarily based on southampton equipment data) for mapping onto the CERIF equipment schema. EquipmentSchema.xsd
Mapping of EPSRC equipment data and University of Bath equipment records to the base schema¶
The EPSRC equipment schema was mapped onto the base schema as shown below:
Example record : http://equipment.epsrc.ac.uk/EquipmentDetails.aspx?EquipmentID=68029
Technique => name1 Organisation => orgUnitID Department => faculty Research Area => taxonomy Equipment Details => description Research Group Website => UrlPath Full Name => contact1Name Phone Number => contact1Phone Email => contact1Email
The University of Bath equipment schema was mapped onto the base schema as shown below:
Example record : http://www.bath.ac.uk/person/501281
institution => orgUnitID uid => uniqueID equipmentTitle => name1 equipmentDescription => description recordType => taxonomy responsiblePerson => contact1Name owner => faculty phone => contact1Phone website => contact1URL email => contact1Email
Mapping of base schema to CERIF¶
An initial mapping of the Equipment/Facilities data into CERIF mainly from the Southampton equipment data, EPSRC equipment data, Univ of Bath data. The full file can is available here as a google spreadsheet
Individual comments/suggestions etc have been added for each element of the base schema.
Presentation made to EuroCRIS CERIF Task Group
The REDIC project made a brief presentation to the EuroCRIS CERIF task group meeting here in Edinburgh on the 17th July. The presentation was made to indicate development is underway in this space, and that we would like to feed observations made during the project work into the future development of the CERIF equipment/facility entities.