Collaborating on data in a modern way

Between mid-September and mid-October, the Research Data Support team hosted an international visitor. Dr Tamar Israeli, a librarian from Western Galilee College in Israel, spent four weeks in Edinburgh to increase her experience and understanding around research data management. As part of this visit, Tamar conducted a study into our researchers’ collaborative requirements, and how well our existing tools and services meet their needs. Tamar’s PhD thesis was on the topic of file sharing, and she has recently published another study on information loss in Behaviour & Information Technology: “Losing information is like losing an arm: employee reactions to data loss” (2019). Tamar is also a representative of the Israeli colleges on the University Libraries’ Research Support Committee.

Tamar carried out a small-scale study in order to gain a better understanding of the tools that researchers use to collaborate around data, and to explore the barriers and difficulties that prevent researchers from using institutional tools and services. Six semi-structured interviews were conducted with researchers from the University of Edinburgh, representing different schools, and all of whom collaborate with other researchers on a regular basis on either small- or large-scale projects. She found that participants use many different tools, both institutional and commercial, to collaborate, share, analyse and transfer documents and data files. Decisions about which tools to use are based on data types, data size, usability, network effect and whether their collaborators are in the same institution and country. Researchers tend to use institutional tools only if they are very simple and user friendly, if there is a special requirement for this from funders or principal investigators (PIs), or if it is directly beneficial for them from a data analysis perspective; sharing beyond the immediate collaboration is only a secondary concern. Researchers are generally well aware of the need to keep their data where it will be safe and backed-up, and are not concerned about the risk of data loss. A major issue was the need for tools that answer projects’ particular needs, therefore customisability and scope for interlinking with other systems is very important.

We’d like to thank Tamar for the great work she did, and for the beautiful olive oil and pistachios that she brought with her! Tamar’s findings will key into our ongoing plans for the next phase of the Research Data Service’s continual development, helping us assist researchers to share and work on their data collaboratively, within and beyond the University’s walls.

Martin Donnelly
Research Data Support Manager
Library and University Collections

Announcing our new “Quick Guides” series

Earlier this week we bid farewell to our intern for the past four weeks, Dr Tamar Israeli from the Western Galilee College Library. Tamar spent her time with us carrying out a small-scale study on the collaborative tools that are available to researchers, which ones they use in their work, and what support they feel they need from the University. One of Tamar’s interviewees expressed a view that “[the University’s tools and services] all start with ‘Data-something’, and I need to close my eyes and think which is for what,” a remark which resonated with my own experience upon first starting this job.

When I joined the University’s Library and University Collections as Research Data Support Manager in Summer 2018, I was initially baffled by the seemingly vast range of different data storage and sharing options available to our researchers. By that point I had already worked at Edinburgh for more than a decade, and in my previous role I had little need or obligation to use institutionally-supported services. Consequently, since I rarely if ever dealt with personal or sensitive information, I tended to rely on freely-available commercial solutions: Dropbox, Google Docs, Evernote – that sort of thing. Finding myself now in a position where I and my colleagues were required to advise researchers on the most appropriate systems for safely storing and sharing their (often sensitive) research data, I set about producing a rough aide memoire for myself, briefly detailing the various options available and highlighting the key differences between them. The goal was to provide a quick means or identifying – or ruling out – particular systems for a given purpose. Researchers might ask questions like: is this system intended for live or archived data? Does it support collaboration (increasingly expected within an ever more interconnected and international research ecosystem)? Is it suitable for storing sensitive data in a way that assures research participants or commercial partners that prying eyes won’t be able to access their personal information without authorisation? (A word to the wise: cloud-based services like Dropbox may not be!)


[click the image for higher resolution version]

Upon showing early versions to colleagues, I was pleasantly surprised that they often expressed an interest in getting a copy of the document, and thought that it might have a wider potential audience within the University. In the months since then, this document has gone through several iterations, and I’m grateful to colleagues with specific expertise in the systems that we in the Research Data Service don’t directly support (such as the Wiki and the Microsoft Office suite of applications) for helping me understand some of the finer details. The intention is for this to be a living document, and if there are any inaccuracies in this (or indeed subsequent) versions, or wording that could be made clearer, just let us know and we’ll update it. It’s probably not perfect (yet!), but my hope is that it will provide enough information for researchers, and those who support them, to narrow down potential options and explore these in greater depth than the single-page table format allows.

With Tamar’s internship finishing up this week, it feels like a timely moment to release the first of our series of “Quick Guides” into the world. Others will follow shortly, on topics including Research Data Protection, FAIR Data and Open Research, and we will create a dedicated Guidance page on the Research Data Service website to provide a more permanent home for these and other useful documents. We will continue to listen to our researchers’ needs and strive to keep our provision aligned with them, so that we are always lowering the barriers to uptake and serving our primary purpose: to enable Edinburgh’s research community to do the best possible job, to the highest possible standards, with the least amount of hassle.

And if there are other Guides that you think might be useful, let us know!

Martin Donnelly
Research Data Support Manager
Library and University Collections

University of Edinburgh Data Safe Haven: a new facility for sensitive data

Information Services has implemented a remote-access “Safe Haven” environment to protect data confidentiality, satisfy concerns about data loss and reassure Data Controllers about the University’s secure management and processing of their data in compliance with Data Protection Legislation.

The Data Safe Haven (DSH) provides a secure storage space and a secure analytic environment that is appropriate for all research projects working with different kinds of sensitive data. It has its own firewall and is isolated from the University network. It is located in a secure facility with controlled access. All traffic between the DSH and the user’s computer is encrypted and no internet access is available. Access to the DSH is only for authorized users via an assigned ‘Yubikey’ and secure VMware Horizon Client, and will only be available from the managed desktops that are white listed for access to the DSH.

Provision of a range of analytic and supporting applications (e.g., SPSS, STATA, SAS, MATLAB, and R) is available. These are delivered dynamically and are assigned to the project. The applications that are available to the users will depend on the type of arrangement that has been made with the DSH technical team prior to the project registration and on the licensing arrangements with the software provider.

The DSH initial security review (penetration test) was carried out by a CREST accredited organisation in August 2018. The DSH exhibited an overall good security stance and demonstrated resilience against the various types of tests performed by the consultants. This was the initial review that formed part of our ongoing drive towards ISO 27001 certification. We expect to complete this phase of the project and obtain the certificate by November 2019.

We have successfully closed the pilot phase of the DSH with five projects in October 2018, and softly launched the service at our “Dealing with Data” conference in November 2018. At present, the DSH Technical team has been migrating Centre for Clinical Brain Sciences – National CJD Research and Surveillance project data from the walled garden into the DSH.

The DSH operates on a cost recovery basis and this cost should be included in grant applications. We welcome enquiries from researchers as early as possible in their project planning. Costing is based on bespoke project requirements (see DSH Overview for users at https://www.ed.ac.uk/is/data-safe-haven.

The DSH Operations team also provides:

  • advice and input for funding and permissions applications;
  • guidance on meeting Approved Researcher requirements;
  • advice about meeting data sharing requirements and archiving of data.

We can set up a demo environment for researchers on request to explore the use of the DSH for their projects. If you need further information, please contact the RDS Team via data-support@ed.ac.uk.

Cuna Ekmekcioglu, Data Safe Haven Manager, Research Data Service

Dealing With Data 2018: Summary reflections

The annual Dealing With Data conference has become a staple of the University’s data-interest calendar. In this post, Martin Donnelly of the Research Data Service gives his reflections on this year’s event, which was held in the Playfair Library last week.

One of the main goals of open data and Open Science is that of reproducibility, and our excellent keynote speaker, Dr Emily Sena, highlighted the problem of translating research findings into real-world clinical interventions which can be relied upon to actually help humans. Other challenges were echoed by other participants over the course of the day, including the relative scarcity of negative results being reported. This is an effect of policy, and of well-established and probably outdated reward/recognition structures. Emily also gave us a useful slide on obstacles, which I will certainly want to revisit: examples cited included a lack of rigour in grant awards, and a lack of incentives for doing anything different to the status quo. Indeed Emily described some of what she called the “perverse incentives” associated with scholarship, such as publication, funding and promotion, which can draw researchers’ attention away from the quality of their work and its benefits to society.

However, Emily reminded us that the power to effect change does not just lie in the hands of the funders, governments, and at the highest levels. The journal of which she is Editor-in-Chief (BMJ Open Science) has a policy commitment to publish sound science regardless of positive or negative results, and we all have a part to play in seeking to counter this bias.

Photo-collage of several speakers at the event

A collage of the event speakers, courtesy Robin Rice (CC-BY)

In terms of other challenges, Catriona Keerie talked about the problem of transferring/processing inconsistent file formats between heath boards, causing me to wonder if it was a question of open vs closed formats, and how could such a situation might have been averted, e.g. via planning, training (and awareness raising, as Roxanne Guildford noted), adherence to the 5-star Open Data scheme (where the third star is awarded for using open formats), or something else? Emily earlier noted a confusion about which tools are useful – and this is a role for those of us who provide tools, and for people like myself and my colleague Digital Research Services Lead Facilitator Lisa Otty who seek to match researchers with the best tools for their needs. Catriona also reminded us that data workflow and governance were iterative processes: we should always be fine-tuning these, and responding to new and changing needs.

Another theme of the first morning session was the question of achieving balances and trade-offs in protecting data and keeping it useful. And a question from the floor noted the importance of recording and justifying how these balance decisions are made etc. David Perry and Chris Tuck both highlighted the need to strike a balance, for example, between usability/convenience and data security. Chris spoke about dual testing of data: is it anonymous? / is it useful? In many cases, ideally it will be both, but being both may not always be possible.

This theme of data privacy balanced against openness was taken up in Simon Chapple’s presentation on the Internet of Things. I particularly liked the section on office temperature profiles, which was very relevant to those of us who spend a lot of time in Argyle House where – as in the Playfair Library – ambient conditions can leave something to be desired. I think Simon’s slides used the phrase “Unusual extremes of temperatures in micro-locations.” Many of us know from bitter experience what he meant!

There is of course a spectrum of openness, just as there are grades of abstraction from the thing we are observing or measuring and the data that represents it. Bert Remijsen’s demonstration showed that access to sound recordings, which compared with transcription and phonetic renderings are much closer to the data source (what Kant would call the thing-in-itself (das Ding an sich) as opposed to the phenomenon, the thing as it appears to an observer) is hugely beneficial to linguistic scholarship. Reducing such layers of separation or removal is both a subsidiary benefit of, and a rationale for, openness.

What it boils down to is the old storytelling adage: “Don’t tell, show.” And as Ros Attenborough pointed out, openness in science isn’t new – it’s just a new term, and a formalisation of something intrinsic to Science: transparency, reproducibility, and scepticism. By providing access to our workings and the evidence behind publications, and by joining these things up – as Ewan McAndrew described, linked data is key (this the fifth star in the aforementioned 5-star Open Data scheme.) Open Science, and all its various constituent parts, support this goal, which is after all one of the goals of research and of scholarship. The presentations showed that openness is good for Science; our shared challenge now is to make it good for scientists and other kinds of researchers. Because, as Peter Bankhead says, Open Source can be transformative – Open Data and Open Science can be transformative. I fear that we don’t emphasise these opportunities enough, and we should seek to provide compelling evidence for them via real-world examples. Opportunities like the annual Dealing With Data event make a very welcome contribution in this regard.

PDFs of the presentations are now available in the Edinburgh Research Archive (ERA). Videos from the day are published on MediaHopper.

Other resources

Martin Donnelly
Research Data Support Manager
Library and University Collections
University of Edinburgh