US Government Data: Lost and Found

Image of rescue tube with floppy desk and words Data Rescue ProjectActions by the current US Trump administration (and others, including Trump’s first term) have spurred archivists, librarians and activists to archive, capture, collect, crawl, hoard, mirror, preserve, rescue, track and save datasets produced at taxpayer expense and until recently made available on government websites.

For example, just as US federal research into climate change, or even mentioning climate, has been paused and government agencies defunded, so the datasets produced from these activities have been removed from public reach or disappeared. The same is true for health data around vaccine research (National Institutes of Health, Centres for Disease Control and Prevention), human subject data deemed to be furthering EDI – equality, diversity, and inclusion – (USAID), and longitudinal educational data measuring attainment and social mobility (Department of Education). In some cases, as on this US government web page from the National Environmental Satellite, Data, and Information Service of the National Oceanic and Atmospheric Administration (NOAA), there are both items that are being decommissioned and archived, and others that are simply being decommissioned and deleted.

Particular challenges to archiving such data are capturing whole databases from scraping techniques, metadata loss, loss of provenance tracking (audit trail of changes), and the inability to add records or collect further data without massive government investment. Also, isolated efforts mean the data cannot easily be discovered.

Fortunately, the Data Rescue Project is coming to the rescue (along with other initiatives). It is a coordinated effort among data organisations and individuals, including librarians and data professionals. It serves as “a clearinghouse for data rescue-related efforts and data access points for public US governmental data that are currently at risk.” The web page provides a host of pointers to current efforts, resources, a tracker tool, and press coverage – including the New Yorker and Le Monde.

Researchers at University of Edinburgh who find that data they require for their research is being removed from publicly available sites may contact the Research Data Support team to discuss potential actions to take.

Robin Rice
Data Librarian and Head of Research Data Support
Library and University Collections

University of Edinburgh’s new Research Data Management Policy

Following a year-long consultation with research committees and other stakeholders, a new RDM Policy (www.ed.ac.uk/is/research-data-policy) has replaced the landmark 2011 policy, authored by former Digital Curation Centre Director, Chris Rusbridge, which seemed to mark a first for UK universities at the time. The original policy (doi: 10.7488/era/1524) was so novel it was labeled ‘aspirational’ by those who passed it.

"Policy"

CC-BY-SA-2.0, Sustainable Economies Law Centre, flickr

RDM has come a long way since then, as has the University Research Data Service which supports the policy and the research community. Expectation of a data management plan to accompany a research proposal has become much more ordinary, and the importance of data sharing has also become more accepted in that time, with funders’ policies becoming more harmonised (witness UKRI’s 2016 Concordat on Open Research Data).

What has changed?

Although a bit longer (the first policy was ten bullet points and could fit on a single page!), the new policy adds clarity about the University’s expectations of researchers (both staff and students), adds important concepts such as making data FAIR (explanation below) and grounding concepts in other key University commitments and policies such as research integrity, data protection, and information security (with references included at the end). Software code, so important for research reproducibility, is included explicitly.

CC BY 2.0, Big Data Prob, KamiPhuc on flickr

Definitions of research data and research data management are included, as well as specific references to some of the service components that can help – DMPOnline, DataShare, etc. A commitment to review the policy every 5 years, or sooner if needed, is stated, so another ten years doesn’t fly by unnoticed. Important policy references are provided with links. The policy has graduated from aspirational – the word “must” occurs twelve times, and “should” fifteen times. Yet academic freedom and researcher choice remains a basic principle.

Key messages

In terms of responsibilities, there are 3 named entities:

  • The Principle Investigator retains accountability, and is responsible as data owner (and data controller when personal data are collected) on behalf of the University. Responsibility may be delegated to a member of a project team.
  • Students should adhere to the policy/good practice in collecting their own data. When not working with data on behalf of a PI, individual students are the data owner and data controller of their work.
  • The University is responsible for raising awareness of good practice, provision of useful platforms, guidance, and services in support of current and future access.

Data management plans are required:

  • Researchers must create a data management plan (DMP) if any research data are to be collected or used.
  • Plans should cover data types and volume, capture, storage, integrity, confidentiality, retention and destruction, sharing and deposit.
  • Research data management plans must specify how and when research data will be made available for access and reuse.
  • Additionally, a Data Protection Impact Assessment is required whenever data pertaining to individuals is used.
  • Costs such as extra storage, long-term retention, or data management effort must be addressed in research proposals (so as to be recovered from funders where eligible).
  • A University subscription to the DMPOnline tool guides researchers in creating plans, with funder and University templates and guidance; users may request assistance in writing or reviewing a plan from the Research Data Service.

FAIR data sharing is more nuanced than ‘open data’:

  • Publicly funded research data should be made openly available as soon as possible with as few restrictions as necessary.
  • Principal Investigators and research students should consider how they can best make their data FAIR in their Data Management Plans (findable, accessible, interoperable, reusable).
  • Links to relevant publications, people, projects, and other research products such as software or source code should be provided in metadata records, with persistent identifiers when available.
  • Discoverability and access by machines is considered as important as access by humans. Standard open licences should be applied to data and code deposits.

Use data repositories to achieve FAIR data:

  • Research data must be offered for deposit and retention in a national or international data service or domain repository, or a University repository (see next bullet).
  • PIs may deposit their data for open access for all (with or without a time-limited embargo) in Edinburgh DataShare, a University data repository; or DataVault, a restricted access long-term retention solution.
  • Research students may deposit a copy of their (anonymised) data in Edinburgh DataShare while retaining ownership.
  • Researchers should add a dataset metadata record in Pure to data archived elsewhere, and link it to other research outputs.
  • Software code relevant to research findings may be deposited in code repositories such as Gitlab or Github (cloud).

Consider rights in research data:

  • Researchers should consider the rights of human subjects, as well as citizen scientists and the public to have access to their data, as well as external collaborators.
  • When open access to datasets is not legal or ethical (e.g. sensitive data), information governance and restrictions on access and use must be applied as necessary.
  • The University’s Research Office can assist with providing templates for both incoming and outgoing research data and the drafting and negotiation of data sharing agreements.
  • Exclusive rights to reuse or publish research data must not be passed to commercial publishers.

Robin Rice
Data Librarian and Head, Research Data Support
Library & University Collections

New home for Edinburgh Research Data Blog!

Tempus fugit. This Data Blog, which has been going since 2013 is now moving to Edinburgh University Libraryblogs. This follows the 2018 organisational merger of the Data Library team at EDINA with Research Data Support in Library & University Collections.

We hope you will actively subscribe to the new blog at https://libraryblogs.is.ed.ac.uk/datablog/ now, by entering your email address in the right navigation panel so you don’t miss any future posts!

Meanwhile we will redirect the old URL and all the older posts to the new site so you won’t have to remember where to go to catch all the news about the Research Data Service and research data management at University of Edinburgh. Any cited posts or bookmarks will continue to resolve.

Otherwise it just remains to thank our former and future hosts – EDINA, and the Digital Library – for providing the platform.

Robin Rice
Data Librarian and Head of Research Data Support
Library and University Collections

First online meeting of the Research Outputs Forum (15th October 2020)

After a brief introduction from chair Martin Donnelly, who recapped the rationale for merging the former Open Access and Research Data Management forums into a single channel for communicating progress across Library Research Support to research support colleagues within the Schools and Colleges, Nik Hussin from the Research Information Systems team kicked off with a status report on the future upgrade schedule for the University’s Pure CRIS system, which records research outputs and underpins the University’s all-important REF submission.

Head of Library Research Support Dominic Tate began by talking about the new Research Publications Policy, which is in the process of being scrutinised and approved by various committees and other stakeholders. Dominic explained that the overall goals of the new policy are to make compliance with REF and funders’ Open Access policies easier for all concerned, and to empower researchers to make their own decisions about how and where to publish. Fiona Wright from the Scholarly Communications team then provided updates on publications Block Grants in light of Plan S. These come from Wellcome Trust, British Heart Foundation and Cancer Research UK. Dominic wrapped up with a brief REF update, noting that as much of the submission as possible will be done electronically, although some hard copies will still be required.

Moving on to Research Data Support, we heard from Pauline Ward about new features and improvements to the DataVault, including a larger maximum deposit size of ten terabytes, the refined and more streamlined review process, and the new ‘roles’ feature which enables, for example, a research administrator to access information about all of the deposits from their School.

Robin Rice gave a recap of progress on the Data Safe Haven, including its ISO27001 accreditation and an elucidation of the charges and cost recovery mechanism for using the Safe Haven. Finally, Martin Donnelly talked about the pros and cons of moving our previously face-to-face training online – in short, less interaction with attendees but higher levels of attendance and uptake – and gave a quick overview of the guidance resources produced by Research Data Service, such as the Quick Guides on topics such as data storage options, the FAIR principles for data, and much more.

We received positive feedback from the meeting, and will be organising the next one for early in 2021.

(Update: the Research Data Support slides from this meeting are now available here.)

Martin Donnelly
Research Data Support Manager
Library and University Collections