Jisc Data Vault update

Posted on behalf of Claire Knowles

Research data are being generated at an ever-increasing rate. This brings challenges in how to store, analyse, and care for the data. Part of this problem is the long term stewardship of researchers’ private data and associated files that need a safe and secure home for the medium to long term.

PrintThe Data Vault project, funded by the Jisc #DataSpring programme seeks to define and develop a Data Vault software platform that will allow data creators to describe and store their data safely in one of the growing number of options for archival storage. This may include cloud solutions, shared storage systems, or local infrastructure.

Future users of the Data Vault are invited to Edinburgh on 5th November, to help shape the development work through discussions on: use cases, example data, retention policies, and metadata with the project team.

Book your place at: https://www.eventbrite.co.uk/e/data-vault-community-event-edinburgh-tickets-18900011443

The aims of the second phase of the project are to deliver a first complete version of the platform by the end of November, including:

  • Authentication and authorisation
  • Integration with more storage options
  • Management / monitoring interface
  • Example interface to CRIS (PURE)
  • Development of retention and review policy
  • Scalability testing

Working towards these goals the project team have had monthly face-to-face meetings, with regular Skype calls in between. The development work is progressing steadily, as you can see via the Github repository: https://github.com/DataVault, where there have now been over 300 commits. Progress is also tracked on the open Project Plan where anyone can add comments.

So remember, remember the 5th November and book your ticket.

Claire Knowles, Library & University Collections, on behalf of the JISC Data Vault Project Team

Highlights from the RDM Programme Progress Report: June – July 2015

May 2015 saw the end of the current Roadmap period and we are now into a period of consolidating work that has already been done and organising future work for the next phase of the RDM programme. The draft Roadmap 2.0 has been submitted to the steering group for approval and should be published soon.

The RDM services brochure is now almost complete and should be printed and distributed to Schools soon.

A meeting was held between the RDM team and the DCC to decide on the best approach for offering school level customisation of DMPonline. It was decided that customisation would be offered to all schools and undertaken as and when requests were made.

The Infrastructure upgrades have been procured and physically installed for the IGMM storage capacity expansion and the Research Computing Infrastructure DataStore integration servers.

Upgrades for backup/DR infrastructure have been procured and physically installed.

A 2 part workshop sponsored by EU FOSTER programme, “Good practice in data management & data sharing with social research” and “Overcoming obstacles to sharing data about human subjects” was delivered at the Scottish Graduate School of Political and Social Science Summer School.

A new series of awareness raising presentations and training courses for researchers, committees, and support staff in CHSS, CSE, and CMVM are currently being organised for 2015/16.

The RDM website has been successfully migrated to the new content management system, this included reviewing and revising all content, links, and images to fit with the new structure and layout. All schools in CHSS and CSE now have links on their own webpages to the RDM webpages, and CHSS will be placing a link from their new college pages when they go live.

An EAHIL workshop on ‘Managing research data’ was held on the 10th June and the report of the

Met with RDM staff from University of Lisbon who were visiting the University (via James Toon).

Met with Jennifer Warburton, University of Melbourne, as part of Library visit.

Met with French delegation (INIST-CNRS) as part of Library visit.

Kerry Miller
RDM Service Coordinator


Highlights from the RDM Programme Progress Report: Mar. & Apr. 2015

9 EPSRC grant holders and researchers were interviewed as follow-up to the EPSRC Expectations Awareness Survey conducted in February.

A new leaflet detailing all systems in the RDM portfolio is being created, which will explain the Data Asset Register, and the use of PURE, in the context of all the systems available across the research lifecycle.

The PURE system now has a module that allows the datasets to be described. This fulfils EPSRC Research Data Expectations requiring datasets to be described and those descriptions made available online. Datasets described in PURE are shown as part of staff online profiles in Edinburgh Research Explorer alongside other research outputs.

The RDM Service Coordinator gave an invited presentation at the “Open access and research data management: Horizon 2020 and beyond” FOSTER Workshop, University College Cork, Eire (15 Apr. 2015) – https://www.fosteropenscience.eu/content/looking-after-your-data-rdm-edinburgh-institutional-approach

DCC and RDM programme staff to meet with key contacts in selected schools to discuss local support and draft guidance.

DataShare release 1.7.2 went live in April with a new default open licence – Creative Commons International Attribution, 4.0 (CC-BY 4.0) to replace the Open Data Commons Attribution licence.

Discussion is on-going between UNC-Chapel Hill, Data Library and colleagues in Web, Learning, and Teaching Division about collaborating on ‘MANTRA as a MOOC’ to be delivered next autumn.

ITI / Research Services are to investigate provision of a hosted GIT service for research software generated for research at the university.

RDM programme webpages in Polopoly are being checked for accuracy and compliance for their planned migration to Drupal.

All Schools in CHSS have now added links to the RDM Programme website and other RDM pages through their intranets. GeoSciences, Chemistry, Informatics and Mathematics in CSE have also added said links via their intranets.

Content has been finalised for the new ‘Working with personal and sensitive data’ RDM course.

Stuart Macdonald
RDM Service Coordinator

Edinburgh DataShare – new features for users and depositors

I was asked recently on Twitter if our data library was still happily using DSpace for data – the topic of a 2009 presentation I gave at a DSpace User Group meeting. In responding (answer: yes!) I recalled that I’d intended to blog about some of the rich new features we’ve either adopted from the open source community or developed ourselves to deliver our data users and depositors a better service and fulfill deliverables in the University’s Research Data Management Roadmap.

Edinburgh DataShare was built as an output of the DISC-UK DataShare project, which explored pathways for academics to share their research data over the Internet at the Universities of Edinburgh, Oxford and Southampton (2007-2009). The repository is based on DSpace software, the most popular open source repository system in use, globally.  Managed by the Data Library team within Information Services, it is now a key component in the UoE’s Research Data Programme, endorsed by its academic-led steering group.

An open access, institutional data repository, Edinburgh DataShare currently holds 246 datasets across collections in 17 out of 22 communities (schools) of the University and is listed in the Re3data Registry of Research Data Repositories and indexed by Thomson-Reuters’ Data Citation Index.

Last autumn, the university joined DataCite, an international standards body that assigns persistent identifiers in the form of Digital Object Identifiers (DOIs) to datasets. DOIs are now assigned to every item in the repository, and are included in the citation that appears on each landing page. This helps to ensure that even after the DataShare system no longer exists, as long as the data have a home, the DOI will be able to direct the user to the new location. Just as importantly, it helps data creators gain credit for their published data through proper data citation in textual publications, including their own journal articles that explain the results of their data analyses.

CaptureThe autumn release also streamlined our batch ingest process to assist depositors with large and voluminous data files by getting around the web upload front-end. Currently we are able to accept files up to 10 GB in size but we are being challenged to allow ever greater file sizes.

Making the most of metadata

Discover panel screenshot

Example from Geosciences community

Every landing page (home, community, collection) now has a ‘Discover’ panel giving top hits for each metadata field (such as subject classification, keyword, funder, data type, spatial coverage). The panel acts as a filter when drilling down to different levels,  allowing the most common values to be ‘discovered’ within each section.

The usage statistics at each level  are now publicly viewable as well, so depositors and others can see how often an item is viewed or downloaded. This is useful for many reasons. Users can see what is most useful in the repository; depositors can see if their datasets are being used; stakeholders can compare the success of different communities. By being completely open and transparent, this is a step towards ‘alt-metrics’ or alternative ways measuring scholarly or scientific impact. The repository is now also part of IRUS-UK, (Institutional Repository Usage Statistics UK), which uses the COUNTER standard to make repository usage statistics nationally comparable.

What’s coming?

Stay tuned for future improvements around a new look and feel, preview and display by data type, streaming support, bittorent downloading, and Linked Open Data.

Robin Rice
EDINA and Data Library