Highlights from the RDM Programme Progress Report: Nov – Dec 2014

Key results from the regular RDM Programme Progress Reports presented to both RDM Steering Committee and Action Group. Full RDM progress Reports can be viewed on the RDM programme Wiki (University of Edinburgh only).

  • Equality Impact Assessment (EqIA) for the RDM programme was completed and published on the Estates and Buildings website (see: http://www.docs.csg.ed.ac.uk/EqualityDiversity/EIA/Research_Data_Management_Programme_%28RDM%29_%28IS%29.pdf)
  • Stuart Macdonald and co-author Rory Macneil (RSpace) had a paper (Service integraiton to Enhance RDM: Electronic laboratory Notebook Case Study) accepted for presentation at the International Conference on Digital Curation (London, Feb. 2015).
  • Work has commenced to update deadlines and deliverables in the RDM Project Plan.
  • Successful meeting held with the Software Sustainability Institute to discuss software preservation used/generated in the research process resulting in a number of areas of investigation (see: https://libraryblogs.is.ed.ac.uk/2014/12/
  • FOSTER EU proposal funded for a training event based on MANTRA to be given to the Scottish Social Science Graduate Summer School programme.
  • All items in Datashare now have DataCite DOIs.
  • Data Library have established a new online statistical analysis and visualisation service (SDA) which can provide an add-on service for DataShare.
  • 39 RDM Training courses scheduled for Jan-June 2015. Training materials for two new courses is also being prepared.
  • All data from College File servers complete with all data expected to have been migrated to Datastore by end of January 2015.
  • Positive review and report received on the DataStore Infrastructure by external consultant.
    Edinburgh DMP template revised based on Action group Feedback.
  • Visitors from Germany (Goettingen, November), Switzerland (Ecole Polytechnique Federal de Lausanne, December), and France (Sciences Po, December) met with IS colleagues to learn more about the RDM programme.
  • “Research Data MANTRA: A Labour of Love” (by Robin Rice) was published to Journal of eScience Librarianship following invitation and a peer review process: http://escholarship.umassmed.edu/jeslib/vol3/iss1/4
  • Two submissions to the Jisc Digital Festival event in March (RDM Training with MANTRA & RDM Programme @ Univ. of Edinburgh) have been accepted.

Stuart Macdonald
RDM Service Coordinator

Reflections on IDCC 2015: How the 80/20 rule applies to Research Data Management tools

RDM Tools need to be usable by a defined and significant set of researchers

A key question for universities considering which RDM tools to adopt is how broadly useful each tool is and how widely it will be taken up by researchers across the institution. I was reminded of this when Robin Rice posed the question to Ian MacArdle and Torsten Reimer of Imperial College after listening to their presentation, ‘Green Shoots:  RDM Pilot at Imperial College London’.

This is equally an issue for builders of tools.  Some tools are designed to be used primarily by the people who designed them, to add value to their own research.  Most of the six projects described in the Green Shoots presentation fall into that category.  Other tools are designed to be used by a broader but still limited community, e.g. tools for researchers in a particular discipline.  RDM tools, in contrast, need to be applicable to an even broader range of researchers, certainly across more than one discipline.  I think the 80/20 rule is a good benchmark.  For an RDM tool to be of interest at the institutional level it needs to be relevant to the needs of at least 80% of a defined and significant set of researchers at the institution.

Electronic lab notebook example

To illustrate this point let’s take the example of electronic lab notebooks.  The target user group here includes biologists, chemists, biomedical researchers, and researchers in associated disciplines such as veterinary medicine, plant science, and food science.  In a word, disciplines where the paper lab notebook is widely used.  This is clearly a defined and significant set of researchers at major research institutions, so the target user group meets the minimum viable community definition.

What does an electronic lab notebook need to have and to do in order to meet the requirement of 80% of the users in this defined target group?  One way of approaching this issue is to look at the reactions of ‘outliers’, i.e.  researchers whose focus is so specific that a ‘generic’ offering, even one tailored to this particular set of researchers, may not be attractive.  A large set of potential outliers is chemists.  If there is no support for chemistry in an ELN chemists are likely to find it unsuitable for their needs.  So one challenge is to build in sufficient support for chemists without having the ELN become so chemistry-focussed that other segments of the target user group find it unworkable.  When this is done in most cases the majority of chemists go from being potential outliers to proponents of adoption.

A second set of outliers are bioinformaticians, who overlap with another set of outliers, researchers who build their own software as part of their research.  Both of these groups tend to prefer ‘home-built’ solutions, and for this reason may not be inclined to adopt ELNs that in fact work well for” the 80%”.  One way of increasing the likelihood that some members of these groups will make use of the ELN and hence become proponents is to build a good API.  This enables these groups to integrate the ELN with the software and solutions they are developing, in which case they are more likely to (a) appreciate the ELN’s benefits, and (b) find that it adds value to their workflow.

A third set of outliers are those who are reluctant to adopt an ELN because for whatever reason they do not see it fitting into their workflow.  In many cases this is a temporary barrier which can be overcome as the doubters come to understand the benefits of using an ELN.  In other cases there is a specific issue relating to the use of a paper notebook.  For example, some researchers work in labs where chemical spillage is likely.  They are reluctant to bring their own tablet computer into the controlled area because they are afraid it will be damaged — they will only adopt the ELN, for use on a tablet, if the lab provides a dedicated tablet for them to use in the lab.

Applying the 80/20 rule

Sometimes the doubts raised by outliers like the ones identified above take on a higher profile than the views of the silent majority of researchers who may be willing or even keen to try out and eventually adopt an ELN, but lack effective or well-connected advocates.  In this case the 80/20 rule may come in handy for those involved in RDM – take some soundings from the ‘average’ labs which don’t fall into any of the three categories of outliers noted above.  If there is significant interest among this group, that could be an indication that a product trial is justified, and that, notwithstanding doubts expressed from some quarters, the 80/20 rule is in fact in operation.

Making the right RDM tools available is a joint responsibility

Reflecting on the above, it’s clear that neither developers of RDM tools or those involved in RDM management can, working in isolation, provide the RDM tools that researchers need.  It’s up to tool providers to, first, design tools for the 80%, and then to be sensitive to those with specialist needs and attempt to build in capabilities that will make the tool useful to as many as possible of them, too.  And it’s up to those involved in RDM management to be attentive to the needs of the ‘silent majority’ of researchers and to work with them, and with interested tools providers, to offer researchers tools that meet their needs, even when those needs may not initially be systematically or even very coherently expressed.

 

Leading a Digital Curation ‘Lifestyle’: First day reflections on IDCC15

[First published on the DCC Blog, republished here with permission.]

Okay that title is a joke, but an apt one to name a brief reflection of this year’s International Digital Curation Conference in London this week, with the theme of looking ten years back and ten years forward since the UK Digital Curation Centre was founded.

The joke references an alleged written or spoken mistake someone made in referring to the Digital Curation lifecycle model, gleefully repeated on the conference tweetstream (#idcc15). The model itself, as with all great reference works, both builds on prior work and was a product of its time – helping to add to the DCC’s authority within and beyond the UK where people were casting about for common language and understanding in this new terrain of digital preservation, data curation, and – a perplexing combination of terms which perhaps still hasn’t quite taken off, ‘digital curation’ (at least not to the same extent as ‘research data management’). I still have my mouse-mat of the model and live with regrets it was never made into a frisbee.

Digital Curation Lifecycle

The Digital Curation Lifecycle Model, Sarah Higgins & DCC, 2008

They say about Woodstock that ‘if you remember it you weren’t really there’, so I don’t feel too bad that it took Tony Hey’s coherent opening plenary talk to remind me of where we started way back in 2004 when a small band under the directorship of Peter Burnhill (services) and Peter Buneman (research) set up the DCC with generous funding from Jisc and EPSRC. Former director Chris Rusbridge likes to talk about ‘standing on the shoulders of giants’ when describing long-term preservation, and Tony reminded us of the important, immediate predecessors of the UK e-Science Programme and the ground-breaking government investment in the Australian National Data Service (ANDS) that was already changing a lot of people’s lifestyles, behaviours and outlooks.

Traditionally the conference has a unique format that focuses on invited panels and talks on the first day, with peer-reviewed research and practice papers on the second, interspersed with demos and posters of cutting edge projects, followed by workshops in the same week. So whilst I always welcome the erudite words of the first day’s contributors, at times there can be a sense of, ‘Wait – haven’t things moved on from there already?’ So it was with the protracted focus on academic libraries and the rallying cries of the need for them to rise to the ‘new’ challenges during the first panel session chaired by Edinburgh’s Geoffrey Boulton, focused ostensibly on international comparisons. Librarians – making up only part of the diverse audience – were asking each other during the break and on twitter, isn’t that exactly what they have been doing in recent years, since for example, the NSF requirements in the States and the RCUK and especially EPSRC rules in the UK, for data management planning and data sharing? Certainly the education and skills of data curators as taught in iSchools (formerly Library Schools) has been a mainstay of IDCC topics in recent years, this one being no exception.

But has anything really changed significantly, either in libraries or more importantly across academia since digital curation entered the namespace a decade ago? This was the focus of a panel led by the proudly impatient Carly Strasser, who has no time for ‘slow’ culture change, and provocatively assumes ‘we’ must be doing something wrong. She may be right, but the panel was divided. Tim DiLauro observed that some disciplines are going fast and some are going slow – depending on whether technology is helping them get the business of research done. And even within disciplines there are vast differences –-perhaps proving the adage that ‘the future is here, it’s just not distributed yet’.

panel session

Carly Strasser’s Panel Session, IDCC15

Geoffrey Bilder spoke of tipping points by looking at how recently DOIs (Digital Object Identifiers, used in journal publishing) meant nothing to researchers and how they have since caught on like wildfire. He also pointed blame at the funding system which focuses on short-term projects and forces researchers to disguise their research bids as infrastructure bids – something they rightly don’t care that much about in itself. My own view is that we’re lacking a killer app, probably because it’s not easy to make sustainable and robust digital curation activity affordable and time-rewarding, never mind profitable. (Tim almost said this with his comparison of smartphone adoption). Only time will tell if one of the conference sponsors proves me wrong with its preservation product for institutions, Rosetta.

It took long-time friend of the DCC Clifford Lynch to remind us in the closing summary (day 1) of exactly where it was we wanted to get to, a world of useful, accessible and reproducible research that is efficiently solving humanity’s problems (not his words). Echoing Carly’s question, he admitted bafflement that big changes in scholarly communication always seem to be another five years away, deducing that perhaps the changes won’t be coming from the publishers after all. As ever, he shone a light on sticking points, such as the orthogonal push for human subject data protection, calling for ‘nuanced conversations at scale’ to resolve issues of data availability and access to such datasets.

Perhaps the UK and Scotland in particular are ahead in driving such conversations forward; researchers at the University of Edinburgh co-authored a report two years ago for the government on “Public Acceptability of Data Sharing Between the Public, Private and Third Sectors for Research Purposes,” as a pre-cursor to innovations in providing researchers with secure access to individual National Health Service records linked to other forms of administrative data when informed consent is not possible to achieve.

Given the weight of this societal and moral barrier to data sharing, and the spread of topics over the last 10 years of conferences, I quite agree with Laurence Horton, one of the panelists, who said that the DCC should give a particular focus to the Social Sciences at next year’s conference.

Robin Rice
Data Librarian (and former Project Coordinator, DCC)
University of Edinburgh

Managing data: photographs in research

In collaboration with Scholarly Communications, the Data Library participated in the workshop “Data: photographs in research” as part of a series of workshops organised by Dr Tom Allbeson and Dr Ella Chmielewska for the pilot project “Fostering Photographic Research at CHSS” supported by the College of Humanities and Social Science (CHSS) Challenge Investment Fund.

In our research support roles, Theo Andrew and I addressed issues associated with finding and using photographs from repositories, archives and collections, and the challenges of re-using photographs in research publications. Workshop attendants came from a wide range of disciplines, and were at different stages in their research careers.

First, I gave a brief intro on terminology and research data basics, and navigated through media platforms and digital repositories like Jisc Media Hub, VADS, Wellcome Trust, Europeana, Live Art Archive, Flickr Commons, Library of Congress Prints & Photographs Online Catalog (Muybridge http://hdl.loc.gov/loc.pnp/cph.3a45870) – links below.

Eadweard Muybridge. 1878. The Horse in motion. Photograph.

From the Library of Congress Prints and Photographs Online Catalog

Then, Theo presented key concepts of copyright and licensing, which opened up an extensive discussion on what things researchers have to consider when re-using photographs and what institutional support researchers expect to have. Some workshop attendees shared their experience of reusing photographs from collections and archives, and discussed the challenges they face with online publications.

The last presentation tackling the basics of managing photographic research data was not delivered due to time constraints. The presentation was for researchers who produce photographic materials, however, advice on best RDM practice is relevant to any researcher independently of whether they are producing primary data or reusing secondary data. There may be another opportunity to present the remaining slides to CHSS researchers at a future workshop.

ONLINE RESOURCES

LICENSING