University of Edinburgh’s new Research Data Management Policy

Following a year-long consultation with research committees and other stakeholders, a new RDM Policy (www.ed.ac.uk/is/research-data-policy) has replaced the landmark 2011 policy, authored by former Digital Curation Centre Director, Chris Rusbridge, which seemed to mark a first for UK universities at the time. The original policy (doi: 10.7488/era/1524) was so novel it was labeled ‘aspirational’ by those who passed it.

"Policy"

CC-BY-SA-2.0, Sustainable Economies Law Centre, flickr

RDM has come a long way since then, as has the University Research Data Service which supports the policy and the research community. Expectation of a data management plan to accompany a research proposal has become much more ordinary, and the importance of data sharing has also become more accepted in that time, with funders’ policies becoming more harmonised (witness UKRI’s 2016 Concordat on Open Research Data).

What has changed?

Although a bit longer (the first policy was ten bullet points and could fit on a single page!), the new policy adds clarity about the University’s expectations of researchers (both staff and students), adds important concepts such as making data FAIR (explanation below) and grounding concepts in other key University commitments and policies such as research integrity, data protection, and information security (with references included at the end). Software code, so important for research reproducibility, is included explicitly.

CC BY 2.0, Big Data Prob, KamiPhuc on flickr

Definitions of research data and research data management are included, as well as specific references to some of the service components that can help – DMPOnline, DataShare, etc. A commitment to review the policy every 5 years, or sooner if needed, is stated, so another ten years doesn’t fly by unnoticed. Important policy references are provided with links. The policy has graduated from aspirational – the word “must” occurs twelve times, and “should” fifteen times. Yet academic freedom and researcher choice remains a basic principle.

Key messages

In terms of responsibilities, there are 3 named entities:

  • The Principle Investigator retains accountability, and is responsible as data owner (and data controller when personal data are collected) on behalf of the University. Responsibility may be delegated to a member of a project team.
  • Students should adhere to the policy/good practice in collecting their own data. When not working with data on behalf of a PI, individual students are the data owner and data controller of their work.
  • The University is responsible for raising awareness of good practice, provision of useful platforms, guidance, and services in support of current and future access.

Data management plans are required:

  • Researchers must create a data management plan (DMP) if any research data are to be collected or used.
  • Plans should cover data types and volume, capture, storage, integrity, confidentiality, retention and destruction, sharing and deposit.
  • Research data management plans must specify how and when research data will be made available for access and reuse.
  • Additionally, a Data Protection Impact Assessment is required whenever data pertaining to individuals is used.
  • Costs such as extra storage, long-term retention, or data management effort must be addressed in research proposals (so as to be recovered from funders where eligible).
  • A University subscription to the DMPOnline tool guides researchers in creating plans, with funder and University templates and guidance; users may request assistance in writing or reviewing a plan from the Research Data Service.

FAIR data sharing is more nuanced than ‘open data’:

  • Publicly funded research data should be made openly available as soon as possible with as few restrictions as necessary.
  • Principal Investigators and research students should consider how they can best make their data FAIR in their Data Management Plans (findable, accessible, interoperable, reusable).
  • Links to relevant publications, people, projects, and other research products such as software or source code should be provided in metadata records, with persistent identifiers when available.
  • Discoverability and access by machines is considered as important as access by humans. Standard open licences should be applied to data and code deposits.

Use data repositories to achieve FAIR data:

  • Research data must be offered for deposit and retention in a national or international data service or domain repository, or a University repository (see next bullet).
  • PIs may deposit their data for open access for all (with or without a time-limited embargo) in Edinburgh DataShare, a University data repository; or DataVault, a restricted access long-term retention solution.
  • Research students may deposit a copy of their (anonymised) data in Edinburgh DataShare while retaining ownership.
  • Researchers should add a dataset metadata record in Pure to data archived elsewhere, and link it to other research outputs.
  • Software code relevant to research findings may be deposited in code repositories such as Gitlab or Github (cloud).

Consider rights in research data:

  • Researchers should consider the rights of human subjects, as well as citizen scientists and the public to have access to their data, as well as external collaborators.
  • When open access to datasets is not legal or ethical (e.g. sensitive data), information governance and restrictions on access and use must be applied as necessary.
  • The University’s Research Office can assist with providing templates for both incoming and outgoing research data and the drafting and negotiation of data sharing agreements.
  • Exclusive rights to reuse or publish research data must not be passed to commercial publishers.

Robin Rice
Data Librarian and Head, Research Data Support
Library & University Collections

Data Carpentry & Software Carpentry workshops

The Research Data Service hosted back to back 2-day workshops in the Main Library this week, run by the Software Sustainability Institute (SSI) to train University of Edinburgh researchers in basic data science and research computing skills.

Learners at Data Carpentry workshop

Learners at Data Carpentry workshop

Software Carpentry (SC) is a popular global initiative originating in the US, aimed at training researchers in good practice in writing, storing and sharing code. Both SC and its newer offshoot, Data Carpentry, teaches methods and tools that helps researchers makes their science reproducible. The SSI, based at Edinburgh Parallel Computing Centre (EPCC), organises workshops for both throughout the UK.

Martin Callaghan, University of Leeds

Martin Callaghan, University of Leeds, introduces goals of Data Carpentry workshop.

Each workshop is taught by trainers trained by the SC organisation, using proven methods of delivery, to learners using their own laptops, and with plenty of support by knowledgeable helpers. Instructors at our workshops were from Leeds and EPCC. Comments from the learners – staff and postgraduate students from a range of schools, included, ‘Variety of needs and academic activities/disciplines catered for. Useful exercies and explanations,’ and ‘Very powerful tools.’

Lessons can vary between different workshops, depending on the level of the learners and their requirements, as determined by a pre-workshop survey. The Data Carpentry workshop on Monday and Tuesday included:

  • Using spreadsheets effectively
  • OpenRefine
  • Introduction to R
  • R and visualisation
  • Databases and SQL
  • Using R with SQLite
  • Managing Research & Data Management Plans

The Software Carpentry workshop was aimed at researchers who write their own code, and covered the following topics:

  • Introduction to the Shell
  • Version Control
  • Introduction to Python
  • Using the Shell (scripts)
  • Version Control (with Github)
  • Open Science and Open Research
Software Carpentry learners

Software Carpentry learners

Clearly the workshops were valued by learners and very worthwhile. The team will consider how it can offer similar workshops in the future at a similarly low cost; your ideas welcome!

Robin Rice
EDINA and Data Library

Sustainable software for research

In an earlier blog post (October 2013) Stuart Lewis discussed the 4 aspects of software preservation as detailed in a paper by Matthews et al, A Framework for Software Preservation, namely:

      1. Storage: is the software stored somewhere?

 

      2. Retrieval: can the software be retrieved from wherever it is stored?

 

      3. Reconstruction: can the software be reconstructed (executed)?

 

    4. Replay: when executed, does the software produce the same results as it did originally?

It is with these thoughts in mind that colleagues (1 December 2014) from across IS (Applications Division, EDINA, Research and Learning Services, DCC, IT Infrastructure) met with Neil Chue Hong (Director of the Software Sustainability Institute) (SSI) to discuss how the University of Edinburgh could move forward on the thorny issue of software preservation.

SSI_and_IS_software meeting_dec2014

The take home message agreed by all at the meeting was that it will be easier to look after software in the future if software is managed well just now.

In terms of progressing thinking in this regard there were more questions than answers.

Matters to investigate include:

  • defining what we mean by research software: a spectrum from single R analysis scripts through to large software platforms
  • capturing descriptions of locally created research software products in the Pure Data Asset Registry
  • understanding the number of local research projects that are creating software
  • creating high-level guidance around software development and licensing (with links to SSI and OSS Watch)
  • providing skills and training for early carrer researchers (such as through the Software Carpentry initiative)
  • tools to measure software uptake/usage in local research
  • institutional use of GitLab and other software development tools
  • ascertaining instances and spend on GitHub across the University

“It’s impossible to conduct research without software, say 7 out of 10 UK researchers” or so says an SSI report surveying software generation as part of the research process in Russell Group institutions. Published in Times Higher Education (THE) the report and data that underpins the report are now available.

Much food for thought and further discussion!

Stuart Macdonald
RDM Service Coordinator