Green light for research data storage and management plans

BITS thumbnailAs published in IS BITS Magazine, Issue 7, Summer 2013 –

We have received the go-ahead to proceed with our plans to establish a service for the secure storage, management, sharing and preservation ofresearch data in the University. This will build on the enabling work that has been done over the past few years, which has included: introducing an institutional policy, agreeing an 18 month roadmap, running training and awareness raising sessions, and establishing a governance structure. This has involved the formation of a RDSM Steering Group, with university-wide representation, which is chaired by Professor Peter Clarke (Physics), as well as an RDSM Implementation Group, which is chaired by Dr John Scally (Library and University Collections).

Due to the delay in receiving the go-ahead, the initial tasks over the coming weeks will be to revise the original timescales for service conception and delivery, commence procurement for the high-capacity storage and run comparisons with progress made in other institutions worldwide.

A fuller report will be provided for the next edition of BITS.

– John Scally, Director, Library & University Collections, and Chair, IS Research Data Management Implementation Group

Making research ‘really reproducible’

Listening to Victoria Stodden, Assistant Professor of Statistics at Columbia University, give the keynote speech at the recent Open Repositories conference in lovely Prince Edward Island Canada, I realised we have some way to go on the path towards her idealistic vision of how to “perfect the scholarly record.”

Victoria Park, Charlottetown

Charlottetown, Prince Edward Island

As an institutional data repository manager (for Edinburgh DataShare) I often listen and talk to users about the reasons for sharing and not sharing research data. One reason, well-known to users of the UK Data Archive (now known as the UK Data Service), is if the dataset is very rich and can be used for multiple purposes beyond those for which it was created, for example, the British Social Attitudes Survey.

Another reason for sharing data, increasingly being driven by funder and publisher policies, is to allow replication of published results, or in the case of negative results which are never published, to avoid duplication of effort and wasting public monies.

It is the second reason on which Stodden focused, and not just for research data but also for the code that is run on the data to produce the scientific results. It is for this reason she and colleagues have set up the web service, Run My Code. These single-purpose datasets do not normally get added to collections within data archives and data centres, as their re-use value is very limited. Stodden’s message to the audience of institutional repository managers and developers was that the duty of preserving these artefacts of the scientific record should fall to us.

Why should underlying code and data be published and preserved along with articles as part of the scholarly record? Stodden argues, because computation is becoming central to scientific research. We’ve all heard arguments behind the “data deluge”. But Stodden persuasively focuses on the evolution of the scientific record itself, arguing that Reproducible Research is not new. It has its roots in Skepticism – developed by Robert Boyle and the Royal Society of the 1660s. Fundamentally, it’s about “the ubiquity of error: The central motivation of the scientific method is to root out error.”

In her keynote she developed this theme by expanding on the three branches of science.

  • Branch 1: Deductive. This was about maths and formal logic, and the proof as the main product of scientific endeavor.
  • Branch 2: Empirical. Statistical analysis of controlled experiments – hypothesis testing, structured communication of methods and protocols. Peer reviewed articles became the norm.
  • Branch 3: Computational. This is at an immature stage, in part because we have not developed the means to rigorously test assertions from this branch.

Stodden is scathing in her criticism of the way computational science is currently practiced, consisting of “breezy demos” at conferences that can’t be challenged or “poked at.” She argues passionately for the need to facilitate reproducibility – the ability to regenerate published results.

What is needed to achieve openness in science? Stodden argued for the need for deposit and curation of versioned data and code, with a link to the published article, and  permanence of the link.This is indeed within the territory of the repository community.

Moreover, to have sharable products at the end of a research project, one needs to plan to share from the outset. It’s very difficult to reproduce the steps to create the results as an afterthought.

I couldn’t agree more with this last assertion. Since we set up Edinburgh DataShare we have spoken to a number of researchers about their ‘legacy’ datasets which – although they would like to make them publicly available, they cannot, either because of the nature of the consent forms, the format of the material, or the lack of adequate documentation. The easiest way is to plan to share. Our Research Data Management pages have information on how to do this, including use of the Digital Curation Centre’s tool, DMPOnline.

– Robin Rice, Data Librarian

Research Data Management: good news fit to print!

Congratulations to Digital Curation Centre staffer Sarah Jones, who co-authored an article about UK University RDM initiatives with Jisc Programme Manager Simon Hodson in the Guardian Higher Education Network pages today:

“Seven rules of successful research data management in universities: Sound research rests on the ability to evidence, verify and reproduce results – managing your data enables all three.”

http://gu.com/p/3hbh8

Sarah is helping the University develop support for Data Management Planning towards its RDM Roadmap goals as part of the DCC’s 21 institutional engagements providing tailored support to increase research data management capability.

A number of direct and indirect pointers to the University of Edinburgh’s work appear in this concise but well-presented piece, including the Research Data MANTRA online course, the formation of an RDM Steering Group, a roadmap to address EPSRC data sharing requirements, online guidance for staff, and librarian training.

MANTRA shortlisted

Research Data MANTRA (http://datalib.edina.ac.uk/mantra/) is a free, non-credit course designed for postgraduate students and early career researchers which provides guidelines for good practice in research data management.

Image depicting a shortlist
(Flickr Image by Soilse – CC BY-NC-ND 2.0)

In recognition of the work done by Edinburgh University Data Library in developing this open educational resource, MANTRA has been evaluated and shortlisted in a report by the Research Information Literacy and Digital Scholarship (RILADS) project as one of 15 good practice examples designed to enhance information literacy skills of postgraduate students and early career researchers in UK Higher Education

RILADS aims to investigate and report on support available to students, staff and researchers to enhance digital literacy. There are two strands to the project. One is co-ordinated by Research Information Network (RIN) on behalf of Research Information and Digital Literacies Coalition (RIDLs), the other by SCONUL under the JISC Developing Digital Literacies (DDL) programme.

The RIN strand focuses on the identification and promotion of good practice in information handling and data management training and development across the HE and research sectors. The SCONUL strand aims to identify, harvest, and use materials to progress the development of digital professional expertise.

The full report and shortlist are available on the RILADS website: http://rilads.wordpress.com/2013/06/10/rilads-report/

MANTRA, the sole resource on the shortlist that is dedicated to research data management skills, has also been upgraded to Version 2 of the Xerte Online Toolkit, the e-learning development environment used by MANTRA to create the learning materials. This has the ability to deliver content using HTML5 rather than the Flash Player. This has a number of advantages in that you can deliver content to a much wider range of devices, and specifically you can deliver content to devices that do not support Flash.

Watch out for further MANTRA enhancements!

Stuart Macdonald
EDINA & Data Library