Quicker, easier interface for DataVault

We are delighted to report that Edinburgh DataVault now has a quicker and easier process for users. The team have been working hard to overhaul the form for creating a new ‘vault’ to improve the user experience. The changes allow us to gather all the information we need from the user directly through a DataVault web page. The users no longer need to first login to Pure and create a separate dataset record there. Instead, that bit will be automated for them.

I have explained the new process and walked users through the new form in a new and updated how-to video, now combining the getting started information with the demo of how to create a vault:

Get started and create your vault! (8 mins)

The new streamlined process for users is represented and compared to our open research data repository DataShare in this workflow diagram. DataVault is designed for restricted access, but can also handle far larger datasets than DataShare.

The diagram shows the steps users go through in DataShare and DataVault. Common steps are deposit and approval.

The DataVault process includes the gathering of funding information, and review and deletion none of which are present in the DataShare workflow since they would not be relevant to that open research data repository.

The arrow showing DataVault metadata going to the internet represents the copying of selected metadata fields into Pure, where they are accessible as dataset records in the university’s Edinburgh Research Explorer online portal.

Our new course “Archiving Your Research Data”, featuring Sara Thomson, Digital Archivist, provides an introduction to digital preservation for researchers, combined with practical support on how to put digital preservation into practice using the support and systems available here at University of Edinburgh such as the DataVault. For future dates and registration information please see our Workshops page.

A recording of an earlier workshop (before the new interface was released) is also available: Archiving Your Research Data Part 1: Long-term Preservation.

If you are a University of Edinburgh principal investigator, academic, or support professional interested in using the Edinburgh DataVault, please get in touch by emailing data-support@ed.ac.uk.

Pauline Ward
Research Data Support Assistant
Library and University Collections
University of Edinburgh

The big 3-0-0-0: DataShare reaches three thousand datasets

Quote

Confetti banner says "3,000th deposit!!!" 2021-08-02

Timestamp showing the accession of the deposit on the 2nd of August.

We’re thrilled Edinburgh DataShare has just ingested its 3,000th deposit:

Davey, Thomas; Draycott, Samuel; Pillai, Ajit; Gabl, Roman; Jordan, Laura-Beth. (2021). Wave buoy in current – experimental data, [dataset]. University of Edinburgh. School of Engineering. Institute for Energy Systems. FloWave Ocean Energy Research Facility. https://doi.org/10.7488/ds/3105.

The depositor was Dr Tom Davey, Senior Experimental Officer in the School of Engineering, who said:

“It is a pleasure for us all in FloWave to see one of our datasets achieve this milestone for Edinburgh DataShare. This is also the tenth DataShare upload making use of experimental outputs from the FloWave Ocean Energy Research Facility. Providing a reliable and accessible repository of our project outputs is not only important for our funders, but also promotes new research collaborations and builds lasting impact for our experimental programmes. This particular project will aid in the understanding measuring wave and currents at deployment sites for offshore renewable energy technologies, and adds to the existing FloWave portfolio of datasets in the field of wave energy, tidal energy, advanced measurement, and remote operated vehicles.”

You can explore more data generated at FloWave in the IES DataShare Collection:

Collection – The Institute for Energy Systems (IES) (ed.ac.uk)

Although the ‘wave buoy in current’ dataset is under temporary embargo, currently set to expire on the 5th of September, it is possible to request the data using DataShare’s request-a-copy feature in the meanwhile. Embargoes may be extended, or lifted early, usually reflecting publication dates.

You might also enjoy this hilarious and very popular video about FloWave:

 

Pauline Ward

Research Data Support Assistant

Library & University Collections

University of Edinburgh

DataVault – larger deposits and new review process notifications

New deposit size limit: 10TB

Great news for DataVault users: you can now deposit up to a whopping ten terabytes in a single deposit in the Edinburgh DataVault! That’s five times greater than the previous deposit limit, saving you time that might have been wasted splitting your data artificially and making multiple deposits.

It’s still a good idea to divide up your data into deposits that correspond well to whatever subsets of the dataset you and your colleagues are likely to want to retrieve at any one time. That’s because you can only retrieve a single deposit in its entirety; you cannot select individual files in the deposit to retrieve. Smaller deposits are quicker to retrieve. And remember you’ll need enough space for the retrieved data to arrive in.

We’ve made some performance improvements thanks to our brilliant technical team, so depositing now goes significantly faster. Nonetheless, please bear in mind that any deposit of multiple terabytes will probably take several days to complete (depending on how many deposits are queueing and some characteristics of the fileset), because the DataVault needs time to encrypt the data and store it on the tape archives and into the cloud. Remember not to delete your original copy from your working area on DataStore until you receive our email confirming that the deposit has completed!

And you can archive as many deposits as you like into a vault, as long as you have the resources to pay the bill when we send you the eIT!

A reminder on how to structure your data:
https://www.ed.ac.uk/information-services/research-support/research-data-service/after/datavault/prepare-datavault/structure

 Ensuring good stewardship of your data through the review process

Another great feature that’s now up and running is the review process notification system, and the accompanying dashboard which allows the curators to implement decisions about retaining or deleting data.

Vault owners should receive an email when the chosen review date is six months away, seeking your involvement in the review process. The email will provide you with the information you need about when the funder’s minimum retention period (if there is one) expires, and how to access the vault. Don’t worry if you think you might have moved on by then; the system is designed to allow the University to implement good stewardship of all the data vaults, even when the Principal Investigator (PI) is no longer contactable. Our curators use a review dashboard to see all vaults whose review dates are approaching, and who the Nominated Data Managers (NDMs) are. In the absence of the Owner, the system notifies the NDMs instead. We will consult with the NDMs or the School about the vault, to ensure all deposits that should be deleted are deleted in good time, and all deposits that should be kept longer are kept safe and sound and still accessible to all authorised users.

DataVault Review Process:
https://www.ed.ac.uk/information-services/research-support/research-data-service/after/datavault/review-process 

The new max. deposit size of 10 TB is equivalent to over five million images of around 2 MB each – that’s one selfie for every person in Scotland. Image: A selfie on the cliffs at Bell Hill, St Abbs
cc-by-sa/2.0 – © Walter Baxter – geograph.org.uk/p/5967905

Pauline Ward
Research Data Support Assistant
Library & University Collections

Research Data Workshops: DataVault Summary

Having soft-launched the DataVault facility in early 2019, the Research Data Support team -with the support of the project board – held five workshops in different colleges and locations to find out what the user community thought about it. This post summarises what we learned from participants, who were made up roughly equally of researchers (mainly staff) and support professionals (mainly computing officers based in the Schools and Colleges).

Each workshop began with presentations and a demonstration by Research Data Service staff, explaining the rationale of the DataVault, what it should and should not be used for, how it works, how the University will handle long-term management of data assets deposited in the DataVault, and practicalities such as how to recover costs through grant proposals or get assistance to deposit.

After a networking lunch we held discussion groups, covering topics such as prioritisation of features and functionality, roles such as the university as data asset owner, and the nature of the costs (price).

The team was relieved to learn that the majority (albeit from a somewhat self-selecting sample) agreed that the service fulfilled a real need; some data does need to be kept securely for a named period to comply with research funders’ rules, and participants welcomed a centralised platform to do this. The levels of usability and functionality we have managed to reach so far were met with somewhat less approval: clearly the development team has more work to do, and we are glad to have won further funding from the Digital Research Services programme in 2019-2020 in order to do it.

Attitudes toward university ownership of data assets was also a mixed bag; some were sceptical and wondered if researchers would participate in such a scheme, but others found it a realistic option for dealing with staff turnover and the inevitability of data outlasting data owners. Attitudes toward cost were largely accepting (the DataVault provides a cheaper alternative than our baseline DataStore disk storage), but concerns about the safekeeping of legacy and unfunded research data were raised at each workshop.

A sample of points raised follows:

  • Utility? “Everyone I know has everything on OneDrive.”
  • Regarding prioritisation of features – security first; file integrity first; putting data from other sources than DataStore; facilitating larger deposit sizes; ease of use.
  • Quickness of deposit and retrieval? Deposit was deemed more important to be quick than retrieval.
  • University as data asset owner?
    • Under GDPR the data are already university assets (because the Uni is the data controller).
    • People who manage the data should be close to the research; IT people can manage users but shouldn’t be making decisions about data. Danger that because it’s related to IT it gets dumped on IT officers. The formal review process helps to ensure decisions will be made properly. Include flexibility into the review hierarchy to allow for variation in school infrastructure.
    • When I heard that I was – not shocked – but concerned. If I move to another university how do I get access? This might be a problem. Researchers might prefer to retain three copies themselves.
  • Is the cost recovery mechanism valid?
    • Vault costs are legitimate costs.
    • Ideally should come from grant overheads, until then need to charge.
    • Possible to charge for small / medium/large project at start rather than per TB?
  • Is the 100 GB threshold sufficient for unfunded research? How else could unfunded or legacy data be covered (who pays)?
    • Alumni sponsor a dataset scheme?
    • There will be people with a ‘whole bunch of data somewhere’ that would be more appropriately stored in DataVault.

The team is grateful to all of the workshop participants for their time and thoughts; the report will be considered further by the project board and the Research Data Service Steering Group members. The full set of workshop notes are colour-coded to show comments from different venues and are available to read on the RDM wiki, for anyone with a University log-in (EASE).


Robin Rice
Data Librarian and Head, Research Data Support
Library & University Collections