Announcing our new “Quick Guides” series

Earlier this week we bid farewell to our intern for the past four weeks, Dr Tamar Israeli from the Western Galilee College Library. Tamar spent her time with us carrying out a small-scale study on the collaborative tools that are available to researchers, which ones they use in their work, and what support they feel they need from the University. One of Tamar’s interviewees expressed a view that “[the University’s tools and services] all start with ‘Data-something’, and I need to close my eyes and think which is for what,” a remark which resonated with my own experience upon first starting this job.

When I joined the University’s Library and University Collections as Research Data Support Manager in Summer 2018, I was initially baffled by the seemingly vast range of different data storage and sharing options available to our researchers. By that point I had already worked at Edinburgh for more than a decade, and in my previous role I had little need or obligation to use institutionally-supported services. Consequently, since I rarely if ever dealt with personal or sensitive information, I tended to rely on freely-available commercial solutions: Dropbox, Google Docs, Evernote – that sort of thing. Finding myself now in a position where I and my colleagues were required to advise researchers on the most appropriate systems for safely storing and sharing their (often sensitive) research data, I set about producing a rough aide memoire for myself, briefly detailing the various options available and highlighting the key differences between them. The goal was to provide a quick means or identifying – or ruling out – particular systems for a given purpose. Researchers might ask questions like: is this system intended for live or archived data? Does it support collaboration (increasingly expected within an ever more interconnected and international research ecosystem)? Is it suitable for storing sensitive data in a way that assures research participants or commercial partners that prying eyes won’t be able to access their personal information without authorisation? (A word to the wise: cloud-based services like Dropbox may not be!)


[click the image for higher resolution version]

Upon showing early versions to colleagues, I was pleasantly surprised that they often expressed an interest in getting a copy of the document, and thought that it might have a wider potential audience within the University. In the months since then, this document has gone through several iterations, and I’m grateful to colleagues with specific expertise in the systems that we in the Research Data Service don’t directly support (such as the Wiki and the Microsoft Office suite of applications) for helping me understand some of the finer details. The intention is for this to be a living document, and if there are any inaccuracies in this (or indeed subsequent) versions, or wording that could be made clearer, just let us know and we’ll update it. It’s probably not perfect (yet!), but my hope is that it will provide enough information for researchers, and those who support them, to narrow down potential options and explore these in greater depth than the single-page table format allows.

With Tamar’s internship finishing up this week, it feels like a timely moment to release the first of our series of “Quick Guides” into the world. Others will follow shortly, on topics including Research Data Protection, FAIR Data and Open Research, and we will create a dedicated Guidance page on the Research Data Service website to provide a more permanent home for these and other useful documents. We will continue to listen to our researchers’ needs and strive to keep our provision aligned with them, so that we are always lowering the barriers to uptake and serving our primary purpose: to enable Edinburgh’s research community to do the best possible job, to the highest possible standards, with the least amount of hassle.

And if there are other Guides that you think might be useful, let us know!

Martin Donnelly
Research Data Support Manager
Library and University Collections

University of Edinburgh Data Safe Haven: a new facility for sensitive data

Information Services has implemented a remote-access “Safe Haven” environment to protect data confidentiality, satisfy concerns about data loss and reassure Data Controllers about the University’s secure management and processing of their data in compliance with Data Protection Legislation.

The Data Safe Haven (DSH) provides a secure storage space and a secure analytic environment that is appropriate for all research projects working with different kinds of sensitive data. It has its own firewall and is isolated from the University network. It is located in a secure facility with controlled access. All traffic between the DSH and the user’s computer is encrypted and no internet access is available. Access to the DSH is only for authorized users via an assigned ‘Yubikey’ and secure VMware Horizon Client, and will only be available from the managed desktops that are white listed for access to the DSH.

Provision of a range of analytic and supporting applications (e.g., SPSS, STATA, SAS, MATLAB, and R) is available. These are delivered dynamically and are assigned to the project. The applications that are available to the users will depend on the type of arrangement that has been made with the DSH technical team prior to the project registration and on the licensing arrangements with the software provider.

The DSH initial security review (penetration test) was carried out by a CREST accredited organisation in August 2018. The DSH exhibited an overall good security stance and demonstrated resilience against the various types of tests performed by the consultants. This was the initial review that formed part of our ongoing drive towards ISO 27001 certification. We expect to complete this phase of the project and obtain the certificate by November 2019.

We have successfully closed the pilot phase of the DSH with five projects in October 2018, and softly launched the service at our “Dealing with Data” conference in November 2018. At present, the DSH Technical team has been migrating Centre for Clinical Brain Sciences – National CJD Research and Surveillance project data from the walled garden into the DSH.

The DSH operates on a cost recovery basis and this cost should be included in grant applications. We welcome enquiries from researchers as early as possible in their project planning. Costing is based on bespoke project requirements (see DSH Overview for users at https://www.ed.ac.uk/is/data-safe-haven.

The DSH Operations team also provides:

  • advice and input for funding and permissions applications;
  • guidance on meeting Approved Researcher requirements;
  • advice about meeting data sharing requirements and archiving of data.

We can set up a demo environment for researchers on request to explore the use of the DSH for their projects. If you need further information, please contact the RDS Team via data-support@ed.ac.uk.

Cuna Ekmekcioglu, Data Safe Haven Manager, Research Data Service

Dealing With Data 2018: Summary reflections

The annual Dealing With Data conference has become a staple of the University’s data-interest calendar. In this post, Martin Donnelly of the Research Data Service gives his reflections on this year’s event, which was held in the Playfair Library last week.

One of the main goals of open data and Open Science is that of reproducibility, and our excellent keynote speaker, Dr Emily Sena, highlighted the problem of translating research findings into real-world clinical interventions which can be relied upon to actually help humans. Other challenges were echoed by other participants over the course of the day, including the relative scarcity of negative results being reported. This is an effect of policy, and of well-established and probably outdated reward/recognition structures. Emily also gave us a useful slide on obstacles, which I will certainly want to revisit: examples cited included a lack of rigour in grant awards, and a lack of incentives for doing anything different to the status quo. Indeed Emily described some of what she called the “perverse incentives” associated with scholarship, such as publication, funding and promotion, which can draw researchers’ attention away from the quality of their work and its benefits to society.

However, Emily reminded us that the power to effect change does not just lie in the hands of the funders, governments, and at the highest levels. The journal of which she is Editor-in-Chief (BMJ Open Science) has a policy commitment to publish sound science regardless of positive or negative results, and we all have a part to play in seeking to counter this bias.

Photo-collage of several speakers at the event

A collage of the event speakers, courtesy Robin Rice (CC-BY)

In terms of other challenges, Catriona Keerie talked about the problem of transferring/processing inconsistent file formats between heath boards, causing me to wonder if it was a question of open vs closed formats, and how could such a situation might have been averted, e.g. via planning, training (and awareness raising, as Roxanne Guildford noted), adherence to the 5-star Open Data scheme (where the third star is awarded for using open formats), or something else? Emily earlier noted a confusion about which tools are useful – and this is a role for those of us who provide tools, and for people like myself and my colleague Digital Research Services Lead Facilitator Lisa Otty who seek to match researchers with the best tools for their needs. Catriona also reminded us that data workflow and governance were iterative processes: we should always be fine-tuning these, and responding to new and changing needs.

Another theme of the first morning session was the question of achieving balances and trade-offs in protecting data and keeping it useful. And a question from the floor noted the importance of recording and justifying how these balance decisions are made etc. David Perry and Chris Tuck both highlighted the need to strike a balance, for example, between usability/convenience and data security. Chris spoke about dual testing of data: is it anonymous? / is it useful? In many cases, ideally it will be both, but being both may not always be possible.

This theme of data privacy balanced against openness was taken up in Simon Chapple’s presentation on the Internet of Things. I particularly liked the section on office temperature profiles, which was very relevant to those of us who spend a lot of time in Argyle House where – as in the Playfair Library – ambient conditions can leave something to be desired. I think Simon’s slides used the phrase “Unusual extremes of temperatures in micro-locations.” Many of us know from bitter experience what he meant!

There is of course a spectrum of openness, just as there are grades of abstraction from the thing we are observing or measuring and the data that represents it. Bert Remijsen’s demonstration showed that access to sound recordings, which compared with transcription and phonetic renderings are much closer to the data source (what Kant would call the thing-in-itself (das Ding an sich) as opposed to the phenomenon, the thing as it appears to an observer) is hugely beneficial to linguistic scholarship. Reducing such layers of separation or removal is both a subsidiary benefit of, and a rationale for, openness.

What it boils down to is the old storytelling adage: “Don’t tell, show.” And as Ros Attenborough pointed out, openness in science isn’t new – it’s just a new term, and a formalisation of something intrinsic to Science: transparency, reproducibility, and scepticism. By providing access to our workings and the evidence behind publications, and by joining these things up – as Ewan McAndrew described, linked data is key (this the fifth star in the aforementioned 5-star Open Data scheme.) Open Science, and all its various constituent parts, support this goal, which is after all one of the goals of research and of scholarship. The presentations showed that openness is good for Science; our shared challenge now is to make it good for scientists and other kinds of researchers. Because, as Peter Bankhead says, Open Source can be transformative – Open Data and Open Science can be transformative. I fear that we don’t emphasise these opportunities enough, and we should seek to provide compelling evidence for them via real-world examples. Opportunities like the annual Dealing With Data event make a very welcome contribution in this regard.

PDFs of the presentations are now available in the Edinburgh Research Archive (ERA). Videos from the day are published on MediaHopper.

Other resources

Martin Donnelly
Research Data Support Manager
Library and University Collections
University of Edinburgh

New research data storage

The latest BITS magazine for University of Edinburgh staff (Issue 8, Autumn/ Winter 2013) contains a lead article on new data storage facilities that Information Services have recently procured and will be making available to researchers for their research data management.

“The arrival of the RDM storage and its imminent roll out is an exciting step in the development of our new set of services under the Research Data Management banner. Ensuring that the service we deploy is fair, useful and t transparent are key principles for the IS team.” John Scally

 

Information Services is very pleased to announce that our new Research Data Storage hardware has been safely delivered.

Following a competitive procurement process, a range of suppliers were selected to provide the various parts of the infrastructure, incl. Dell, NetApp, Brocade and Cisco. The bulk of the order was assembled over the summer in China and shipped to the King’s Buildings campus at the end of August. Since then IT Infrastructure staff have been installing, testing and preparing the storage for roll-out.

How good is the storage?
Information Services recognises the importance of the University’s research data and has procured enterprise-class storage infrastructure to underpin the programme of Research
Data services. The infrastructure ranges from the highest class of flash-storage (delivering 375,000 IO operations per second) to 1.6PB (1 Petabyte = 1,024 Terabytes) of bulk storage arrays. The data in the Research Data Management (RDM) file-store is automatically replicated to an off-site disaster facility and also backed up with a 60-day retention period, with 10 days of file history visible online.

Who qualifies for an allocation?
Every active researcher in the University! This is an agreement between the University and the researcher to provide quality active data storage, service support and long term curation for researchers. This is for all researchers, not just Principal Investigators or those in receipt of external grants to fund research.

When do I get my allocation?
We are planning to roll out to early adopter Schools and institutes late November this year. This is dependent on all of the quality checks and performance testing on the system being completed successfully, however, confidence is high that the deadline will be met.
The early adopters for the initial service roll-out are: School of GeoSciences, School of Philosophy, Psychology and Language Sciences, and the Centre for Population Health
Sciences. Phased roll-out to all areas of the University will follow.

How much free allocation will I receive?
The University has committed 0.5TB (500GB) of high quality storage with guaranteed backup and resilience to every active researcher. The important principle at work is that the 0.5TB is for the individual researcher to use primarily to store their active research data. This ensures that they can work in a high quality and resilient environment and, hopefully, move valuable data from potentially unstable local drives. Research groups
and Schools will be encouraged to pool their allocations in order to facilitate shared data management and collaboration.

This formula was developed in close consultation with College and School representatives; however, there will be discipline differences in how much storage is required and individual need will not be uniform. A degree of flexibility will be built into the
allocation model and roll-out, though if researchers go over their 0.5TB free allocation they will have to pay.

Why is the University doing this?
The storage roll-out is one component of a suite of existing and planned services known as our Research Data Management Initiative. An awareness raising campaign accompanies the storage allocation to Schools, units and individuals to
encourage best practice in research data management planning and sharing.

Research Data Management support services:
www.ed.ac.uk/is/data-management

University’s Research Data Management Policy:
www.ed.ac.uk/is/research-data-policy

BITS magazine (Issue 8, Autumn/ Winter 2013)
http://www.ed.ac.uk/schools-departments/information-services/about/news/edinburgh-bits