The RECODE project is looking at open data policy for EU-funded research. I attended a workshop in Sheffield yesterday for a diverse stakeholder group of researchers, funders and data providers. Along with a nice lunch, they delivered their first draft report, in which they synthesised current literature on open research data and presented five case studies of research practice in different disciplines. The format was very interactive with several break-out groups and discussions.
The usual barriers to data sharing were trotted out in different forms. (Forgive my ho-hum tone if this is a newish topic for you – our DISC-UK DataShare project summarised these in its 2007 ‘State-of-the-Art-Review’ and the reasons haven’t really changed since.) The RECODE team ably boiled these down to technical, cultural and economic issues.
The morning’s activity included a small-group discussion about disciplinary differences in motivations for data sharing. One gadfly (not me) questioned the premise of the whole topic. While differences in practice around treatment of data is undeniable, are the motivations for sharing or not sharing data really different amongst groups of researchers?
This seemed a fair point. For any given obstacle – be it commercial viability, fear of being scooped, errors being found or data being misinterpreted, desire to keep one’s ‘working capital’ for future publication, lack of time to properly prepare the data and documentation required for re-use coupled with lack of perceived academic rewards, lack of infrastructure, or disappearance of key personnel (including postgrads) – these are all disincentives for data sharing wherever they crop up.
On the flip-side, motivations to share – making data easily available to one’s colleagues and students, adding to the scholarly record, backing up one’s reported results, desire for others to add value to a treasured dataset, increasing one’s impact and potential citations, passing off the custodianship of a completed dataset to a trusted archive, or mere compliance with a funder’s or publisher’s policy are reasons that transcend disciplinary boundaries.
“Reciprocal altruism” was a new one to me. I’m not sure I believe it exists. I’ve seen more than one study showing that researchers (also teachers, where open educational resources are concerned) crave open access to other people’s ‘stuff’ whether or not they feel obliged to share their own (and more don’t than do).
An afternoon discussion focused on how open data needed to be, to be considered open. This was an amusing diversion from the topic we were given by the organisers. The UK Data Archive funded by ESRC, while a bulwark in the patchy architecture of data preservation and dissemination, does not make any of its collections available without a registration procedure that not only asks you who you are, but what you intend to do with the data. If the data are non-sensitive in nature, how necessary is this? Does the fact that the data owner would like to collect this information warrant collecting it?
A recent consensus on a new jiscmail list, data-publication, was that this sort of ‘red tape’ routinely placed in the way of data access was an affront to academic freedom. Would you agree? Would your answer depend on whether you were the user or the owner?
Edinburgh DataShare has so far resisted the temptation to require user registration for any data deposited with us, because the service was established to be an open data repository for the use of University depositors and for re-use by other researchers as well as the public (which, in most cases paid for the research). We offer our depositors normal website download statistics, and provide a suggested citation to each dataset to encourage proper attribution. We encourage use of an open data licence which requires attribution of the data creator. For depositors who do not wish to use an open licence they are free to provide their own rights statement.
The ODC-attribution licence that we offer by default is compatible with the Budapest Open Access Initiative (BOAI), but is one step less open than “CC0″ (pronounced CC-zero) where rights to the data are waived in the interest of complete freedom for data re-users. Some argue that data – as opposed to publications – should be made completely open in this way to allow pooling of numerous datasets for analysis and machine-processing.
For example, Professor Carol Goble has just written in her blog that “BioMed Central’s adoption of the Creative Commons CC0 waiver opens up the way that data published in their journals can be used, so that it can be freely mined, analysed, and reused.”
While I agree BioMed Central’s decision is good news and that CC0 licences may be the state of the art for open data, as a repository manager I have yet to meet an academic who does not wish to be attributed for data collected by the ‘sweat of the brow’ to use a phrase from copyright case law. It is slightly easier for me to persuade researchers to share their data openly with the reassurance that an open-attribution licence brings than to persuade them to waive their rights to be attributed.
The University Research Data Management Policy asserts, “Research data of future historical interest, and all research data that represent records of the University, including data that substantiate research findings, will be offered and assessed for deposit and retention in an appropriate national or international data service or domain repository, or a University repository.”
In practice, it has been acknowledged that this would be difficult to enforce for ‘legacy’ research data, but from now on researchers embarking on a new research project are expected to create a data management plan in which the short and long term management of the data are considered before they are collected: “All new research proposals… must include research data management plans or protocols that explicitly address data capture, management, integrity, confidentiality, retention, sharing and publication.
How open will you make your next dataset?