Electronic Laboratory Notebooks – help or hindrance to academic research?

On the 30 October 2013 the University of Edinburgh (UoE) organised what I believe to be the first University wide meeting on Electronic Lab Notebooks (ELN), and allowed a number of Principal Investigators (PIs) and others the opportunity to provide useful feedback on their user experiences.  This provided an excellent opportunity to help discuss and inform what the UoE can do to help its researchers, and whether there is likely to be one ‘solution’ which could be implemented across the UoE or if a more bespoke and individual/discipline specific approach would be required.

Lab Notes by S.S.K. – Flickr

Good research and good research data management (RDM) stem from the ability of researchers to accurately record, find, retrieve and store the information from their research endeavours.  For many, but by no means all, this will initially be done by recording their outputs on the humble piece of paper.  Albeit one contained within a hardbound notebook (to ensure an accurate chronological record of the work) and supplemented liberally with printouts, photographs, x-rays, etc. and reminders of where to look for the electronic data relevant to the day’s work (ideally at least).

Presentations from University researchers

Slides from these presentations are available to UoE members via the wiki.

The event kicked off with a live demonstration from the member of the School of Physics & Astronomy, and his positive experiences with the Livescribe system.  This demonstration impressively articulated the functions of the electronic pen, which allows its user to record, stroke by stroke, their writings, and pass on this information either as a movie or document to others, and store the output electronically.  Although there were some disadvantages noted, such as the physical size of the pen and the reliance on WiFi for certain features, and that to date, only certain iOS 7 devices are supported (although this list will grow in 2014).  Clearly, this device has had a positive effect on both the presenter’s research and teaching duties.  However the livescribe pen does not in itself help address how to store these digital files.

The remainder of the presentations from the academic researchers were from the fields of life science, although their experiences were quite diverse.  This helpfully provided a good set-up for a healthy discussion, on both ELNs and indeed the wider aspects of RDM at the UoE.

Of the active researchers who presented, two were PIs from the School of Molecular, Genetic and Population Health Sciences and one was a postdoctoral researcher from the School of Biological Sciences.  All three had prior experience in using previous versions of ELNs, and had sought an ELN to address a range of similar issues with paper laboratory notebooks.

Merits and pitfalls of electronic notebooks

I have chosen not to provide feedback on the specific ELNs trialled here, but the software discussed was Evernote, eCAT, and Accelyrs, and as the UoE does not recommend or discourage the use of any particular ELN to-date, I won’t either.

In all cases these electronic systems were purchased for help with key areas:

Motivation/Benefits

  • Searchable data resource
  • Safe archive
  • Sharing data
  • Copy and paste functions
  • Functionality for reviewing lab member’s progress
  • Ability to organise by experiments (not just chronologically)
  • One system to store reagents/freezer contents with experimental data

And in general, key problematic issues raised with these systems were:

Barriers/Problems

  • Need for reliable internet access
  • Hardware integration into lab environment
  • Required more time to document and import data
  • Poor user interface/experience
  • Copy and paste functions (although time-saving, may increase errors as data are not reviewed)
  • Administration time by PI is required
  • PhD students and postdocs (when given the choice*) preferred to use paper notebooks

*it was mooted that no choice should be given.

Infrastructure

A common theme with the use of ELNs was that of the hardware, and the reliance on WiFi.  Clearly when working at the bench with reagents that are potentially hazardous (chemicals, radiation, etc) or with biologicals that you don’t wish to contaminate (primary cell cultures for instance) the hardware used is not supposed to be moved between such locations and  ‘dry areas’ such as your office.  A number of groups have attempted to solve this problem by utilising tablets, and sync to both the “cloud” and their office computers, and this is of course dependent on WiFi.  Without WiFi, you might unexpectedly find yourself with no access to any of your data/protocols, which leads to real problems if you are in the middle of an experiment.  Additionally this requires the outlay of monies for the purchase of the tablets, and provides a tempting means of distraction to group members (both of which may be frowned upon by many PIs).  This monetary concern was identified as a potential problem for the larger groups, where multiple tablets would be required.

Research Data Management & Electronic Laboratory Notebooks

From an RDM perspective the subsequent discussions raised a number of interesting issues.  Firstly, as a number of these ELN services utilise the “cloud” for storage, it was clear that many researchers, PIs included, were unaware of what was expected from them by both their funding councils and the UoE.

Secure Cloud Computing by FutUndBeidl – Flickr

The Data Protection Act 1998 sets out how organisations may use personal data, and the Records Management Section’s guidance on ‘Taking sensitive information and personal data outside the University’s secure computing environment  details the UoE position on this matter, but essentially all sensitive or personal information leaving the UoE should be encrypted.  This guidance would seem not to have reached a significant proportion of the researchers yet.

ELN? – not for academic research!

Whilst the first two presentations were broadly supportive of ELNs, the third researcher’s presentation was distinctly negative, and he provided his interpretation on the use of an ELN in an academic setting.  Although broadly speaking this presentation was on one product, it was made clear that his opinions were not based on one ‘software product’ alone.  In this case the PI has since abandoned the ELN (after four years of use and requiring his lab members to use it), citing reasons of practicality; it took too long to document the results (paper is always quicker), there is no standard for writing up documentation online**, and the data have effectively been stored twice.

He was also of the strong opinion that the use of ELNs:

“were not going to improve your research quality – it’s for those who want to spend time making their data look pretty.”

And –

“it is not for academic research, but more suited for service labs and industry.”

These would seem to be viewpoints that cannot easily be addressed.

The role of the PI

**Of course this is also true for paper versions, with the National Postdoctoral Association (USA)  noting in their toolkit section on ‘Data Acquisition, Management, Sharing and Ownership’ that with the multinational approach to research that:

“many [postdocs] may prefer to keep their notes in their native language instead of English. Postdoc supervisors need to take this into consideration and establish guidelines for the extent to which record keeping must be generally accessible.”

The role of the PI cannot be overlooked in this process and to-date, even if a paper notebook is utilised, there is often no standard to observe.

The next generation of ELNs

Despite these concerns ResearchSpace Ltd are poised to release the next generation of an ELN, with an enterprise release of their popular eCAT ELN, to be called RSpace.  The RSpace team seem confident that they are both aware and capable of addressing these various user requirements and it will certainly be interesting to see how they get on.  Certainly they provided clear evidence of improved user interfaces, enhanced tools, knowledge of University policy, with the prospect of integration into the existing UoE digital infrastructure, such as the data repository, Edinburgh DataShare.

Researcher engagement

Importantly whilst this programme identified concerns and benefits with the various software systems available, it also highlighted issues with the UoE dissemination of RDM knowledge to the research community, and so perhaps fittingly the last word will be from the chair:

“The University has a lot of useful information on this area of data management; please look at the research support pages!”

So the fundamental question remains, what is the best way to engage researchers in RDM and how can we best address this need at all levels?

Links

David Girdwood
EDINA & Data Library

RDM & Cornell University

I’ve been fortunate to have been given the opportunity to take up a secondment at the Cornell Institute for Social and Economic Research (CISER) as Data Services Librarian, the primary tasks of which are to:

  • Modernise the CISER data archive, and if possible, begin the implementation. Tasks include: introduction of persistent identifiers (DOIs) to all archival datasets (via EZID); investigate metadata mapping of archival datasets (DDI, DC, MARCXML); streamline data catalogue functionality (by introducing result sorting, relevance searches, subject classification), assist scoping a data repository solution for social science data assets generated by Cornell researchers
  • Actively participate in the Research Data Management Services Group at Cornell, assisting researchers with their RDM plans, contributing to the advancement of the work of the group
  • Actively consult with researchers about social science datasets and other data outreach activities.
  • Co-ordinate and collate assessment statements in order to gain Data Seal of Approval for CISER data archive.

Last Friday I gave my first presentation on the CISER data archive along with other CISER colleagues (they talked about datasets used in restriction at the Cornell Restricted Access Data Centre, and the CISER Statistical Consultancy Service & ICPSR) at a Policy and Analysis and Management (PAM) workshop for graduate students. This was held at the Survey Research Institute (https://www.sri.cornell.edu/sri/ ) where much discussion centred around survey non-response and mechanisms to counter this increasingly common phenomenon.

On Tuesday of this week I presented on the University of Edinburgh RDM Roadmap at a meeting of the monthly Research Data Management Service Group (RDMSG – http://data.research.cornell.edu). This was followed by two presentations yesterday, one at a Demography Pro-seminar (for graduate students) on campus and later at a Cornell University Library Data Discussion Group meeting in the Mann Library set up to introduce the CISER Data Services Librarian to a range of subject librarians principally in the social sciences. In each case the Edinburgh RDM Roadmap was received with great enthusiasm and engendered much discussion, in particular the centralised and inclusive approach adopted by Edinburgh. Follow up discussion and meetings are being planned including the potential use of MANTRA and the RDM Toolkit for Librarians as materials to raise the profile of RDM at Cornell.

As an aside, at a CISER team meeting the subject was raised about password protection (in some instances passwords to CISER resources are changed on a very regular basis for security purposes) and issues surrounding inappropriate recording of passwords. A site licence for a software protection software package was seen as a possible solution to both user disgruntlement and possible security breaches. As a thought, this might be worth considering as part of the Active Data Infrastructure tool suite.

Stuart Macdonald
Associate Data Librarian, UoE / Visiting CISER Data Services Librarian

How open should your data be?

The RECODE project is looking at open data policy for EU-funded research. I attended a workshop in Sheffield yesterday for a diverse stakeholder group of researchers, funders and data providers. Along with a nice lunch, they delivered their first draft report, in which they synthesised current literature on open research data and presented five case studies of research practice in different disciplines. The format was very interactive with several break-out groups and discussions.

The usual barriers to data sharing were trotted out in different forms. (Forgive my ho-hum tone if this is a newish topic for you – our DISC-UK DataShare project summarised these in its 2007 ‘State-of-the-Art-Review’ and the reasons haven’t really changed since.) The RECODE team ably boiled these down to technical, cultural and economic issues.

The morning’s activity included a small-group discussion about disciplinary differences in motivations for data sharing. One gadfly (not me) questioned the premise of the whole topic. While differences in practice around treatment of data is undeniable, are the motivations for sharing or not sharing data really different amongst groups of researchers?

This seemed a fair point. For any given obstacle – be it commercial viability, fear of being scooped, errors being found or data being misinterpreted, desire to keep one’s ‘working capital’ for future publication, lack of time to properly prepare the data and documentation required for re-use coupled with lack of perceived academic rewards, lack of infrastructure, or disappearance of key personnel (including postgrads) – these are all disincentives for data sharing wherever they crop up.

On the flip-side, motivations to share – making data easily available to one’s colleagues and students, adding to the scholarly record, backing up one’s reported results, desire for others to add value to a treasured dataset, increasing one’s impact and potential citations, passing off the custodianship of a completed dataset to a trusted archive, or mere compliance with a funder’s or publisher’s policy are reasons that transcend disciplinary boundaries.

“Reciprocal altruism” was a new one to me. I’m not sure I believe it exists. I’ve seen more than one study showing that researchers (also teachers, where open educational resources are concerned) crave open access to other people’s ‘stuff’ whether or not they feel obliged to share their own (and more don’t than do).

An afternoon discussion focused on how open data needed to be, to be considered open. This was an amusing diversion from the topic we were given by the organisers. The UK Data Archive funded by ESRC, while a bulwark in the patchy architecture of data preservation and dissemination, does not make any of its collections available without a registration procedure that not only asks you who you are, but what you intend to do with the data. If the data are non-sensitive in nature, how necessary is this? Does the fact that the data owner would like to collect this information warrant collecting it?

A recent consensus on a new jiscmail list, data-publication, was that this sort of ‘red tape’ routinely placed in the way of data access was an affront to academic freedom. Would you agree? Would your answer depend on whether you were the user or the owner?

Edinburgh DataShare has so far resisted the temptation to require user registration for any data deposited with us, because the service was established to be an open data repository for the use of University depositors and for re-use by other researchers as well as the public (which, in most cases paid for the research). We offer our depositors normal website download statistics, and provide a suggested citation to each dataset to encourage proper attribution. We encourage use of an open data licence which requires attribution of the data creator. For depositors who do not wish to use an open licence they are free to provide their own rights statement.

The ODC-attribution licence that we offer by default is compatible with the Budapest Open Access Initiative (BOAI), but is one step less open than “CC0″ (pronounced CC-zero) where rights to the data are waived in the interest of complete freedom for data re-users. Some argue that data – as opposed to publications – should be made completely open in this way to allow pooling of numerous datasets for analysis and machine-processing.

For example, Professor Carol Goble has just written in her blog that “BioMed Central’s adoption of the Creative Commons CC0 waiver opens up the way that data published in their journals can be used, so that it can be freely mined, analysed, and reused.”

While I agree BioMed Central’s decision is good news and that CC0 licences may be the state of the art for open data, as a repository manager I have yet to meet an academic who does not wish to be attributed for data collected by the ‘sweat of the brow’ to use a phrase from copyright case law. It is slightly easier for me to persuade researchers to share their data openly with the reassurance that an open-attribution licence brings than to persuade them to waive their rights to be attributed.

The University Research Data Management Policy asserts, “Research data of future historical interest, and all research data that represent records of the University, including data that substantiate research findings, will be offered and assessed for deposit and retention in an appropriate national or international data service or domain repository, or a University repository.”

In practice, it has been acknowledged that this would be difficult to enforce for ‘legacy’ research data, but from now on researchers embarking on a new research project are expected to create a data management plan in which the short and long term management of the data are considered before they are collected: “All new research proposals… must include research data management plans or protocols that explicitly address data capture, management, integrity, confidentiality, retention, sharing and publication.

How open will you make your next dataset? open data button

An insight into institutional RDM

I attended the Jisc Managing Research Data Programme Workshop in Birmingham on 25-26 March on behalf of the University of Edinburgh and gained a real insight into how other institutions are addressing Research Data Management (RDM) and how well our work has been received. It had participants from all areas of RDM, with good presentations sharing progress made on the subject at their institutions.

What clearly stood out was the compliments on the work we have done so far … this was
mentioned numerous times over the weekend where presenters commented on using our work (such as RDM policy and training) as a starting point for their projects. The ‘Business Cases’ session was particularly interesting highlighting all the important non-technical issues (funding, stakeholders, politics, local culture, etc) that need to be handled sensitively in planning and implementing RDM.

Sarah Jones presented our new DIY toolkit for librarians in the ‘RDM Training’ session. The
toolkit is a self-directed training course, intended to be used by a group of librarians to
build confidence in supporting researchers with RDM. MANTRA modules are used as pre-reading and reflective questions and exercises are used to guide discussion at each face-to-face session. The training materials were well received and are already being reused by other Universities.

It was interesting to discover there was lack of training for IT folk in RDM and a desire to
have this addressed … I reported that we were in the process of producing this at our
University. One institution sent all their RDM staff (IT, librarian, research services, etc)
to the workshop so ‘all’ get a real feel for what is required and appreciate best practices
at other institutions. It was somewhat comforting, but not entirely surprising, to learn
that other institutions have similar challenges to us with RDM.

While the sessions over the two days were informative, the opportunity to network with peers at other places and discuss issues/challenges at the round table sessions and evenings was invaluable and perhaps the biggest plus in attending the workshop. I enjoyed the experience and learnt a lot from it.

You can find out more about the event and access all the presentations and event reports on the event web page.

Abdul Majothi
Head of IS Consultancy for CHSS
User Services Division
Information Services