Knowledge Exchange with Japan

Two members of the Research Data Support team recently had an adventure visiting Japan in order to provide practical lessons to library students and librarians studying in a research data management (RDM) course.

When Professor Emi Ishita had visited the team in 2023, she was preparing a new syllabus for research data management for her library students at the Kyushu University iSchool. Struck by the strong engagement our team members had with researchers through training, supporting data management plans, and moderating data deposits, she returned in August, 2024 with a delegation of practicing librarians to learn from the Universities of Edinburgh, Leeds, and Oxford. The group spent a full day with various members of Library Research Support, going over a question list they had sent in advance, about Open Access (OA) and RDM support, and our approach to training researchers.Staff and visitors eating lunch

At the end of the session, Emi revealed that a grant was available to pay for two RDM practitioners from UoE to come to Kyushu University in Fukuoka to contribute to an October, 2024 public symposium and a two day-long in-person training sessions for her students. The students would include Masters students enrolled in the iSchool at Kyushu, as well as practicing librarians looking to reskill themselves following government policy directives embracing immediate open access and data sharing for research publications.

bento boxThe speakers for the hybrid symposium on OA and RDM in Japan were myself with Dr Simon Smith, along with long-time RDM service provider Jake Carlson from the University of Buffalo (New York), and the Library Director from Chiba University, and the Research Data Service Director at Kyushu University providing a Japanese context for OA and RDM. Dr Ishita introduced the symposium and chaired the panel session – her team also provided all the speakers with beautifully presented bento boxes tailored to each person’s diet for lunch.

While the symposium was exciting, with about 60 people in attendance and about 130 more watching and listening online, it was the practical training that myself and Simon found truly inspirational. The students overcame their customary reserve to answer Simon’s open-ended questions about supporting researchers with data management planning in a classroom setting. Later, they formed into small groups to try out depositing data in DataShare, and evaluate each other’s metadata for quality. The technology worked, the students were curious and engaged, and the Kyushu instructors were pleased with the outcome.

library trainers at Kyushu University, Oct 2024

Japanese hospitality lunchDuring the week of the event, Simon and I visited prior and new contacts at Tokyo University, Chiba University, Kyoto University, Nagoya University and NII. In addition to the excellent company, we were pleased to be visiting such a beautiful country and eating the wonderful food.

NII staff outside restaurant

Mount Aso volcano

Quicker, easier interface for DataVault

We are delighted to report that Edinburgh DataVault now has a quicker and easier process for users. The team have been working hard to overhaul the form for creating a new ‘vault’ to improve the user experience. The changes allow us to gather all the information we need from the user directly through a DataVault web page. The users no longer need to first login to Pure and create a separate dataset record there. Instead, that bit will be automated for them.

I have explained the new process and walked users through the new form in a new and updated how-to video, now combining the getting started information with the demo of how to create a vault:

Get started and create your vault! (8 mins)

The new streamlined process for users is represented and compared to our open research data repository DataShare in this workflow diagram. DataVault is designed for restricted access, but can also handle far larger datasets than DataShare.

The diagram shows the steps users go through in DataShare and DataVault. Common steps are deposit and approval.

The DataVault process includes the gathering of funding information, and review and deletion none of which are present in the DataShare workflow since they would not be relevant to that open research data repository.

The arrow showing DataVault metadata going to the internet represents the copying of selected metadata fields into Pure, where they are accessible as dataset records in the university’s Edinburgh Research Explorer online portal.

Our new course “Archiving Your Research Data”, featuring Sara Thomson, Digital Archivist, provides an introduction to digital preservation for researchers, combined with practical support on how to put digital preservation into practice using the support and systems available here at University of Edinburgh such as the DataVault. For future dates and registration information please see our Workshops page.

A recording of an earlier workshop (before the new interface was released) is also available: Archiving Your Research Data Part 1: Long-term Preservation.

If you are a University of Edinburgh principal investigator, academic, or support professional interested in using the Edinburgh DataVault, please get in touch by emailing data-support@ed.ac.uk.

Pauline Ward
Research Data Support Assistant
Library and University Collections
University of Edinburgh

University of Edinburgh’s new Research Data Management Policy

Following a year-long consultation with research committees and other stakeholders, a new RDM Policy (www.ed.ac.uk/is/research-data-policy) has replaced the landmark 2011 policy, authored by former Digital Curation Centre Director, Chris Rusbridge, which seemed to mark a first for UK universities at the time. The original policy (doi: 10.7488/era/1524) was so novel it was labeled ‘aspirational’ by those who passed it.

"Policy"

CC-BY-SA-2.0, Sustainable Economies Law Centre, flickr

RDM has come a long way since then, as has the University Research Data Service which supports the policy and the research community. Expectation of a data management plan to accompany a research proposal has become much more ordinary, and the importance of data sharing has also become more accepted in that time, with funders’ policies becoming more harmonised (witness UKRI’s 2016 Concordat on Open Research Data).

What has changed?

Although a bit longer (the first policy was ten bullet points and could fit on a single page!), the new policy adds clarity about the University’s expectations of researchers (both staff and students), adds important concepts such as making data FAIR (explanation below) and grounding concepts in other key University commitments and policies such as research integrity, data protection, and information security (with references included at the end). Software code, so important for research reproducibility, is included explicitly.

CC BY 2.0, Big Data Prob, KamiPhuc on flickr

Definitions of research data and research data management are included, as well as specific references to some of the service components that can help – DMPOnline, DataShare, etc. A commitment to review the policy every 5 years, or sooner if needed, is stated, so another ten years doesn’t fly by unnoticed. Important policy references are provided with links. The policy has graduated from aspirational – the word “must” occurs twelve times, and “should” fifteen times. Yet academic freedom and researcher choice remains a basic principle.

Key messages

In terms of responsibilities, there are 3 named entities:

  • The Principle Investigator retains accountability, and is responsible as data owner (and data controller when personal data are collected) on behalf of the University. Responsibility may be delegated to a member of a project team.
  • Students should adhere to the policy/good practice in collecting their own data. When not working with data on behalf of a PI, individual students are the data owner and data controller of their work.
  • The University is responsible for raising awareness of good practice, provision of useful platforms, guidance, and services in support of current and future access.

Data management plans are required:

  • Researchers must create a data management plan (DMP) if any research data are to be collected or used.
  • Plans should cover data types and volume, capture, storage, integrity, confidentiality, retention and destruction, sharing and deposit.
  • Research data management plans must specify how and when research data will be made available for access and reuse.
  • Additionally, a Data Protection Impact Assessment is required whenever data pertaining to individuals is used.
  • Costs such as extra storage, long-term retention, or data management effort must be addressed in research proposals (so as to be recovered from funders where eligible).
  • A University subscription to the DMPOnline tool guides researchers in creating plans, with funder and University templates and guidance; users may request assistance in writing or reviewing a plan from the Research Data Service.

FAIR data sharing is more nuanced than ‘open data’:

  • Publicly funded research data should be made openly available as soon as possible with as few restrictions as necessary.
  • Principal Investigators and research students should consider how they can best make their data FAIR in their Data Management Plans (findable, accessible, interoperable, reusable).
  • Links to relevant publications, people, projects, and other research products such as software or source code should be provided in metadata records, with persistent identifiers when available.
  • Discoverability and access by machines is considered as important as access by humans. Standard open licences should be applied to data and code deposits.

Use data repositories to achieve FAIR data:

  • Research data must be offered for deposit and retention in a national or international data service or domain repository, or a University repository (see next bullet).
  • PIs may deposit their data for open access for all (with or without a time-limited embargo) in Edinburgh DataShare, a University data repository; or DataVault, a restricted access long-term retention solution.
  • Research students may deposit a copy of their (anonymised) data in Edinburgh DataShare while retaining ownership.
  • Researchers should add a dataset metadata record in Pure to data archived elsewhere, and link it to other research outputs.
  • Software code relevant to research findings may be deposited in code repositories such as Gitlab or Github (cloud).

Consider rights in research data:

  • Researchers should consider the rights of human subjects, as well as citizen scientists and the public to have access to their data, as well as external collaborators.
  • When open access to datasets is not legal or ethical (e.g. sensitive data), information governance and restrictions on access and use must be applied as necessary.
  • The University’s Research Office can assist with providing templates for both incoming and outgoing research data and the drafting and negotiation of data sharing agreements.
  • Exclusive rights to reuse or publish research data must not be passed to commercial publishers.

Robin Rice
Data Librarian and Head, Research Data Support
Library & University Collections

Dealing With Data 2018: Summary reflections

The annual Dealing With Data conference has become a staple of the University’s data-interest calendar. In this post, Martin Donnelly of the Research Data Service gives his reflections on this year’s event, which was held in the Playfair Library last week.

One of the main goals of open data and Open Science is that of reproducibility, and our excellent keynote speaker, Dr Emily Sena, highlighted the problem of translating research findings into real-world clinical interventions which can be relied upon to actually help humans. Other challenges were echoed by other participants over the course of the day, including the relative scarcity of negative results being reported. This is an effect of policy, and of well-established and probably outdated reward/recognition structures. Emily also gave us a useful slide on obstacles, which I will certainly want to revisit: examples cited included a lack of rigour in grant awards, and a lack of incentives for doing anything different to the status quo. Indeed Emily described some of what she called the “perverse incentives” associated with scholarship, such as publication, funding and promotion, which can draw researchers’ attention away from the quality of their work and its benefits to society.

However, Emily reminded us that the power to effect change does not just lie in the hands of the funders, governments, and at the highest levels. The journal of which she is Editor-in-Chief (BMJ Open Science) has a policy commitment to publish sound science regardless of positive or negative results, and we all have a part to play in seeking to counter this bias.

Photo-collage of several speakers at the event

A collage of the event speakers, courtesy Robin Rice (CC-BY)

In terms of other challenges, Catriona Keerie talked about the problem of transferring/processing inconsistent file formats between heath boards, causing me to wonder if it was a question of open vs closed formats, and how could such a situation might have been averted, e.g. via planning, training (and awareness raising, as Roxanne Guildford noted), adherence to the 5-star Open Data scheme (where the third star is awarded for using open formats), or something else? Emily earlier noted a confusion about which tools are useful – and this is a role for those of us who provide tools, and for people like myself and my colleague Digital Research Services Lead Facilitator Lisa Otty who seek to match researchers with the best tools for their needs. Catriona also reminded us that data workflow and governance were iterative processes: we should always be fine-tuning these, and responding to new and changing needs.

Another theme of the first morning session was the question of achieving balances and trade-offs in protecting data and keeping it useful. And a question from the floor noted the importance of recording and justifying how these balance decisions are made etc. David Perry and Chris Tuck both highlighted the need to strike a balance, for example, between usability/convenience and data security. Chris spoke about dual testing of data: is it anonymous? / is it useful? In many cases, ideally it will be both, but being both may not always be possible.

This theme of data privacy balanced against openness was taken up in Simon Chapple’s presentation on the Internet of Things. I particularly liked the section on office temperature profiles, which was very relevant to those of us who spend a lot of time in Argyle House where – as in the Playfair Library – ambient conditions can leave something to be desired. I think Simon’s slides used the phrase “Unusual extremes of temperatures in micro-locations.” Many of us know from bitter experience what he meant!

There is of course a spectrum of openness, just as there are grades of abstraction from the thing we are observing or measuring and the data that represents it. Bert Remijsen’s demonstration showed that access to sound recordings, which compared with transcription and phonetic renderings are much closer to the data source (what Kant would call the thing-in-itself (das Ding an sich) as opposed to the phenomenon, the thing as it appears to an observer) is hugely beneficial to linguistic scholarship. Reducing such layers of separation or removal is both a subsidiary benefit of, and a rationale for, openness.

What it boils down to is the old storytelling adage: “Don’t tell, show.” And as Ros Attenborough pointed out, openness in science isn’t new – it’s just a new term, and a formalisation of something intrinsic to Science: transparency, reproducibility, and scepticism. By providing access to our workings and the evidence behind publications, and by joining these things up – as Ewan McAndrew described, linked data is key (this the fifth star in the aforementioned 5-star Open Data scheme.) Open Science, and all its various constituent parts, support this goal, which is after all one of the goals of research and of scholarship. The presentations showed that openness is good for Science; our shared challenge now is to make it good for scientists and other kinds of researchers. Because, as Peter Bankhead says, Open Source can be transformative – Open Data and Open Science can be transformative. I fear that we don’t emphasise these opportunities enough, and we should seek to provide compelling evidence for them via real-world examples. Opportunities like the annual Dealing With Data event make a very welcome contribution in this regard.

PDFs of the presentations are now available in the Edinburgh Research Archive (ERA). Videos from the day are published on MediaHopper.

Other resources

Martin Donnelly
Research Data Support Manager
Library and University Collections
University of Edinburgh