Highlights from the RDM Programme Progress Report: November 2015 – January 2016

Data Seal of Approval have awarded DataShare Trusted Repository status; their assessment of our service can be read at https://assessment.datasealofapproval.org/assessment_175/seal/html/. In addition a major new release of DataShare was completed in November, this makes the code open in Github as well as making general improvements to the look and feel of the website.

The ‘interim’ DataVault is now in final testing and will be rolled out on a request basis to those researchers who can demonstrate an urgent need to use the service now rather than waiting until the final version is ready later this year. The phase three funding for development of the DataVault has been received from Jisc, this runs from March to August, so the final version should be ready for launch sometime after this. The project was presented at the International Digital Curation Conference in February 2016.

Over the three month period a total of 328 staff and postgraduate researchers have attended a Research Data Management (RDM) course or workshop.

Work on the MANTRA MOOC (Massive Open Online Course) was expected to be finalised in February and launched on 1st March, at the following URL: https://www.coursera.org/learn/data-management.

University of Edinburgh wrote the Working with Data section (one out of 5 weeks of the course) and with the help of the Learning, Teaching and Web division of Information Services completed two video interviews with researchers and a ‘vox pop’ video clip of clinical researchers at the EQUATOR conference in Edinburgh in autumn, 2015. The content is open source and videos can be added to our YouTube channel to help with promotion. There will be some income from this, but a smaller portion than our partner, the University of North Carolina, based on certificates of completion priced at $49 or £33.

The need to create a dataset record in PURE for each dataset published, or referenced in a publication, is now being emphasised in all Research Data Service communications, formal and informal, and to staff at all levels. Uptake is understandably low at this point but we hope to see a steady increase as researchers and support staff begin to see the benefits of adding datasets to their research profile. In the case of DataShare records, a draft mapping of fields between DataShare and PURE has been produced as a start of a plan for migrating records from DataShare to PURE.

By the end of January 2016, 69 records had been created and published on Edinburgh Research Explorer.

Four interns have been employed using funding from Jisc as part of the UK Research Data Discovery Service (UKRDDS) project which aims to create a national aggregate register of data sets.  A trial site is available at: http://ckan.data.alpha.jisc.ac.uk/. The UKRDDS interns will help to create PURE records and upload open data into DataShare, and raise awareness of RDM generally within their schools. There are currently three PhD interns in place in LLC, SOS, and Roslin, two more in LLC, & DIPM will start in February. The approach each intern takes will depend on the nature and structure of their school and will, in some cases, be mediated by research administrators.

An innovation fund grant has been received to fund the delivery of an exhibition “Pioneering Research Data”. Each college will be represented by a PhD intern, the recruitment of these has already begun and they should be in post by the end of March. The Exhibition is due to be delivered in November of this year.

National and International Engagement Activities

Robin Rice led a panel at the IPRES conference, Chapel Hill, North Carolina, on 3rd November called ‘Good, better, best’? Examining the range and rationales of institutional data curation practices’.

Robin Rice had a proposal accepted for the forthcoming Force11 (2016) conference, on Overcoming Obstacles to Sharing Data about Human Subjects, building on the training course we are delivering, Working with Personal and Sensitive Data.

Kerry Miller
RDM Service Coordinator

Highlights from the RDM Programme Progress Report: Jan – Feb 2015

The Library and University Collections (L&UC) in association with project partner Manchester University received funding from the Jisc “Research Data Spring” programme to define and develop an open source Data Vault application which will allow data creators to describe and store data safely in one of the growing number of archival storage options. Phase 1 of the project started in March 2015.

The University of Edinburgh (UoE) were invited to contribute to a series of EPSRC (Engineering and Physical Sciences Research Council) Compliance Case Studies. Stuart MacDonald, RDM Service Coordinator, was interviewed by Jisc and the DCC in relation to the RDM programme and institutional compliancy with forthcoming EPSRC research data expectations. The case study will be published on the Jisc website in May 2015.

RDM Service Coordinator Stuart MacDonald co-presented with Rory Macneil (RSpace) their practice paper “Service Integration to Enhance RDM: RSpace electronic laboratory notebook (ELN) case study” at the International Conference on Digital Curation (IDCC) in London (Feb 2015). The paper has been published in the International Journal of Digital Curation (http://www.ijdc.net/index.php/ijdc/article/view/10.1.163), open access.

The RDM Service Coordinator also presented on ‘RDM Training Initiatives @ Edinburgh’ at the “Comparing Notes: Training Librarians for Research Data Management and Open Science Support” workshop at IDCC.

An EPSRC Expectations Awareness Survey was sent out to 98 EPSRC grant holders of which 38 responded. 9** grant holders agreed to participate in a follow-up interview. The findings of the interviews will follow shortly. Dr Evamaria Krause (Marburgh University, Germany) completed a 6 week internship with L&UC where she assisted with the EPSRC Expectations Awareness Survey and EPSRC grant holder interview exercises.

All Schools in the College of Humanities and Social Science (CHSS) have now added links to RDM Programme website and other RDM pages via their intranets. RDM Project Plan deadlines and deliverables which underpin the RDM Roadmap have been updated.* For more details visit the RDM Programme wiki (some content only available to UoE staff).

Four tailored Data Management Plans sessions have been organised with research groups in the College of Medicine and Veterinary Medicine and CHSS, and two workshops for the European Association for Health Information and Libraries (EAHIL) conference in Edinburgh are scheduled to run in June 2015.

Edinburgh DataShare release 1.71 has been announced with new features including faceted browsing, SOLR usage statistics, size limit on assisted deposit of items increased from 5Gb to 10Gb.

DataSync (a Dropbox-like service in development) was themed and made available for beta testing to Information Services colleagues.

Links:

* IT Infrastructure input pending
** 1 PhD student who was forwarded the survey agreed to be interviewed

Stuart Macdonald
RDM Service Coordinator

Research Data Spring – blooming great ideas !

The University of Edinburgh have been busy putting ideas together for Jisc’s Research Data Spring project, part of the research at risk co-design challenge area, which aims to find new technical tools, software and service solutions, which will improve researchers’ workflows and the use and management of their data (see: http://researchdata.jiscinvolve.org/wp/2014/11/24/research-data-spring-let-your-ideas-bloom/).

Library and University Collections in collaboration with colleagues from the University of Manchester have submitted an idea to prototype and then develop ann open source data archive application that is technology agnostic and can sit on top of various underlying storage or archive technologies – see: http://researchatrisk.ideascale.com/a/dtd/Develop-a-DataVault/102647-31525)

EDINA & Data Library have submitted two ideas, namely:

A ‘Cloud Work Bench’ to provide researchers in the geospatial domain (GI Scientists, Geomaticians, GIS experts) with the tools, storage and data persistence they require to conduct research without the need to manage the same in a local context that can be fraught with socio-technical barriers that impede the actual research (see: http://researchatrisk.ideascale.com/a/dtd/Cloud-Work-Bench/101899-31525)

An exploration of the use of Mozilla Open Badges as certification of completion of MANTRA (Research Data Management Training), a well-regarded open educational resource (see: http://researchatrisk.ideascale.com/a/dtd/Open-Badges-for-MANTRA-resource/102084-31525)

Please register with ideascale (http://researchatrisk.ideascale.com/) and VOTE for our blooming great ideas!!

Stuart Macdonald
RDM Service Coordinator

Thinking about a Data Vault


Warning: Undefined array key "file" in /apps/www/wordpress/blogs/wp-includes/media.php on line 1686

[Reposted from https://libraryblogs.is.ed.ac.uk/blog/2013/12/20/thinking-about-a-data-vault/]

In a recent blog post, we looked at the four quadrants of research data curation systems.  This categorised systems that manage or describe research data assets by whether their primary role is to store metadata or data, and whether the information is for private or public use.  Four systems were then put into these quadrants.  We then started to investigate further the requirements of a Data Asset Register in another blog post.

quadrants3

This blog post will look at the requirements and characteristics of a Data Vault, and how this component fits into the data curation system landscape.

What?

The first aspect to consider is what exactly is a Data Vault?  For the purposes of this blog post, we’ll simply consider it is a safe, private, store of data that is only accessible by the data creator or their representative.  For simplicity, it could be considered very similar to a safety deposit box within a bank vault.  However other than the concept, this analogy starts to break down quite quickly, as we’ll discuss later.

Why?

There are different use cases where a Data Vault would be useful.  A few are described here:

  • A paper has been published, and according to the research funder’s rules, the data underlying the paper must be made available upon request.  It is therefore important to store a date-stamped golden-copy of the data associated with the paper.  Even if the author’s own copy of the data is subsequently modified, the data at the point of publication is still available.
  • Data containing personal information, perhaps medical records, needs to be stored securely, however the data is ‘complete’ and unlikely to change, yet hasn’t reached the point where it should be deleted.
  • Data analysis of a data set has been completed, and the research finished.  The data may need to be accessed again, but is unlikely to change, so needn’t be stored in the researcher’s active data store.  An example might be a set of completed crystallography analyses, which whilst still useful, will not need to be re-analysed.
  • Data is subject to retention rules and must be kept securely for a given period of time.

How?

Clearly the storage characteristics for a Data Vault are different to an open data repository or active data filestore for working data.  The following is a list of some of the characteristics a Data Vault will need, or could use:

  • Write-once file system: The file system should only allow each file to be written once.  Deleting a file should require extra effort so that it is hard to remove files.
  • Versioning:  If the same file needs to be stored again, then rather than overwriting the existing file, it should be stored alongside the file as a new version.  This should be an automatic function.
  • File security: Only the data owner or their delegate can access the data.
  • Storage security: The Data Vault should only be accessible through the local university network, not the wider Internet.  This reduces the vectors of attack, which is important given the potential sensitivity of the data contained within the Data Vault.
  • Additional security: Encrypt the data, either via key management by the depositors, or within the storage system itself?
  • Upload and access:  Options include via a web interface (issues with very large files), special shared folders, dedicated upload facilities (e.g. GridFTP), or an API for integration with automated workflows.
  • Integration: How would the Data Vault integrate with the Data Asset Register?  Could the register be the main user interface for accessing the Data Vault?
  • Description:  What level of description, or metadata, is required for data sets stored in the Data Vault, to ensure that they can be found and understood in the future?
  • Assurance:  Facilities to ensure that the file uploaded by the researcher is intact and correct when it reaches the vault, and periodic checks to ensure that the file has not become corrupted.  What about more active preservation functions, including file format migration to keep files up to date (e.g. convert Word 95 documents to Word 2013 format)?
  • Speed:  Can the file system be much slower, perhaps a Hierarchical Storage Management (HSM) system that stores frequently accessed data on disk, but relegates older or less frequently accessed data to slower storage mechanisms such as tape?  Access might then be slow (it takes a few minutes for the data to be automatically retrieved from the tape) but the cost of the service is much lower.
  • Allocation: How much allocation should each person be given, or should it be unrestricted so as to encourage use?  What about costing for additional space?  Costings may be hard, because if the data is to be kept for perpetuity, then whole-life costing will be needed.  If allocation is free, how to stop it being used for routine backups of data rather than golden-copy data?
  • Who:  Who is allowed access to the Data Vault to store data?
  • Review periods:  How to remind data owners what data they have in the Data Vault so that they can review their holdings, and remove unneeded data?

Data Vault

Feedback on these issues and discussion points are very welcome!  We will keep this blog updated with further updates as these services develop.

Image available from http://dx.doi.org/10.6084/m9.figshare.873617

Tony Weir, Head of Unix Section, IT Infrastructure
Stuart Lewis, Head of Research and Learning Services, Library & University Collections.