Dealing with Data 2015 – Call For Papers

uoelogo-1024x164

Date:                     Monday 31 August 2015, 9:30 – 16:00 (lunch provided)

Location:             Informatics Forum, University of Edinburgh

Themes:

Data creation, including non-traditional data types
Data analysis
Data visualisation
Data security
Working with sensitive data
Archiving and sharing data, including preservation, re-use, and licensing
Infrastructure and tools, for example Electronic Lab Notebooks
Research software development and preservation
Linked open data for research, working with government data
Big Data and data mining
Meeting funder requirements for research data management

Format:           

Presentations will be 15 minutes long, with 5 minutes for questions. Depending on numbers, thematic parallel strands may be used.  Presentations will be aimed at an academic audience, but from a wide range of disciplines. Opening and closing keynote presentations will be given.

Call for proposals:

Leading edge research is reliant upon data that are produced or collected during the research process, or on existing data that is being analysed and re-used in new research questions. It is important to effectively manage research data throughout the lifecycle, from data management planning through to archiving and sharing.  Requirements for managing data are increasingly being adopted by institutions and funders in order to foster good research data management practices.

Following on from  the successful ‘Dealing with Data 2014’ half-day conference, Information Services are pleased to announce that we will be hosting a one-day conference covering a broad range of research matters from all disciplines on the subject of ‘Dealing with Data’.

The aim of the conference is for researchers of all levels at the University of Edinburgh to share good practice, emerging techniques and technologies, and practical examples in working with data across the research lifecycle.

We welcome proposals for presentations on any aspect of the challenges and advances in working with data, particularly research with novel methods of creating, using, storing, visualising or sharing data.  A list of themes is given above, although proposals that cover any aspect of working with research data will be considered.

Please send abstracts (maximum 500 words) to dealing-with-data-conference@mlist.is.ed.ac.uk   before Monday 6th July 2015.  Proposals will be reviewed and the programme compiled by Friday 31st July 2015.

Cuna Ekmekcioglu
Library and University Collections

open.ed report

Quote

Lorna M. Campbell, a Digital Education Manager with EDINA and the University of Edinburgh, writes about the ideas shared and discussed at the open.ed event this week.

 

Earlier this week I was invited by Ewan Klein and Melissa Highton to speak at Open.Ed, an event focused on Open Knowledge at the University of Edinburgh.  A storify of the event is available here: Open.Ed – Open Knowledge at the University of Edinburgh.

“Open Knowledge encompasses a range of concepts and activities, including open educational resources, open science, open access, open data, open design, open governance and open development.”

 – Ewan Klein

Ewan set the benchmark for the day by reminding us that open data is only open by virtue of having an open licence such as CC0, CC BY, CC SA. CC Non Commercial should not be regarded as an open licence as it restricts use.  Melissa expanded on this theme, suggesting that there must be an element of rigour around definitions of openness and the use of open licences. There is a reputational risk to the institution if we’re vague about copyright and not clear about what we mean by open. Melissa also reminded us not to forget open education in discussions about open knowledge, open data and open access. Edinburgh has a long tradition of openness, as evidenced by the Edinburgh Settlement, but we need a strong institutional vision for OER, backed up by developments such as the Scottish Open Education Declaration.

open_ed_melissa

I followed Melissa, providing a very brief introduction to Open Scotland and the Scottish Open Education Declaration, before changing tack to talk about open access to cultural heritage data and its value to open education. This isn’t a topic I usually talk about, but with a background in archaeology and an active interest in digital humanities and historical research, it’s an area that’s very close to my heart. As a short case study I used the example of Edinburgh University’s excavations at Loch na Berie broch on the Isle of Lewis, which I worked on in the late 1980s. Although the site has been extensively published, it’s not immediately obvious how to access the excavation archive. I’m sure it’s preserved somewhere, possibly within the university, perhaps at RCAHMS, or maybe at the National Museum of Scotland. Where ever it is, it’s not openly available, which is a shame, because if I was teaching a course on the North Atlantic Iron Age there is some data form the excavation that I might want to share with students. This is no reflection on the directors of the fieldwork project, it’s just one small example of how greater access to cultural heritage data would benefit open education. I also flagged up a rather frightening blog post, Dennis the Paywall Menace Stalks the Archives,  by Andrew Prescott which highlights the dangers of what can happen if we do not openly licence archival and cultural heritage data – it becomes locked behind commercial paywalls. However there are some excellent examples of open practice in the cultural heritage sector, such as the National Portrait Gallery’s clearly licensed digital collections and the work of the British Library Labs. However openness comes at a cost and we need to make greater efforts to explore new business and funding models to ensure that our digital cultural heritage is openly available to us all.

Ally Crockford, Wikimedian in Residence at the National Library of Scotland, spoke about the hugely successful Women, Science and Scottish History editathon recently held at the university. However she noted that as members of the university we are in a privileged position in that enables us to use non-open resources (books, journal articles, databases, artefacts) to create open knowledge. Furthermore, with Wikpedia’s push to cite published references, there is a danger of replicating existing knowledge hierarchies. Ally reminded us that as part of the educated elite, we have a responsibility to open our mindsets to all modes of knowledge creation. Publishing in Wikipedia also provides an opportunity to reimagine feedback in teaching and learning. Feedback should be an open participatory process, and what better way for students to learn this than from editing Wikipedia.

Robin Rice, of EDINA & Data Library, asked the question what does Open Access and Open Data sharing look like? Open Access publications are increasingly becoming the norm, but we’re not quite there yet with open data. It’s not clear if researchers will be cited if they make their data openly available and career rewards are uncertain. However there are huge benefits to opening access to data and citizen science initiatives; public engagement, crowd funding, data gathering and cleaning, and informed citizenry. In addition, social media can play an important role in working openly and transparently.

Robin Rice

James Bednar, talking about computational neuroscience and the problem of reproducibility, picked up this theme, adding that accountability is a big attraction of open data sharing. James recommended using iPython Notebook   for recording and sharing data and computational results and helping to make them reproducible. This promoted Anne-Marie Scott to comment on twitter:

@ammienoot: "Imagine students creating iPython notebooks... and then sharing them as OER #openEd"

“Imagine students creating iPython notebooks… and then sharing them as OER #openEd”

Very cool indeed.

James Stewart spoke about the benefits of crowdsourcing and citizen science.   Despite the buzz words, this is not a new idea, there’s a long tradition of citizens engaging in science. Darwin regularly received reports and data from amateur scientists. Maintaining transparency and openness is currently a big problem for science, but openness and citizen science can help to build trust and quality. James also cited Open Street Map as a good example of building community around crowdsourcing data and citizen science. Crowdsourcing initiatives create a deep sense of community – it’s not just about the science, it’s also about engagement.

open._ed_james

After coffee (accompanied by Tunnocks caramel wafers – I approve!) We had a series of presentations on the student experience and students engagement with open knowledge.

Paul Johnson and Greg Tyler, from the Web, Graphics and Interaction section of IS,  spoke about the necessity of being more open and transparent with institutional data and the importance of providing more open data to encourage students to innovate. Hayden Bell highlighted the importance of having institutional open data directories and urged us to spend less time gathering data and more making something useful from it. Students are the source of authentic experience about being a student – we should use this! Student data hacks are great, but they often have to spend longer getting and parsing the data than doing interesting stuff with it. Steph Hay also spoke about the potential of opening up student data. VLEs inform the student experience; how can we open up this data and engage with students using their own data? Anonymised data from Learn was provided at Smart Data Hack 2015 but students chose not to use it, though it is not clear why.  Finally, Hans Christian Gregersen brought the day to a close with a presentation of Book.ed, one of the winning entries of the Smart Data Hack. Book.ed is an app that uses open data to allow students to book rooms and facilities around the university.

What really struck me about Open.Ed was the breadth of vision and the wide range of open knowledge initiatives scattered across the university.  The value of events like this is that they help to share this vision with fellow colleagues as that’s when the cross fertilisation of ideas really starts to take place.

This report first appeared on Lorna M. Campbell’s blog, Open World:  lornamcampbell.wordpress.com/2015/03/11/open-ed

P.S. another interesting talk came from Bert Remijsen, who spoke of the benefits he has found from publishing his linguistics research data using DataShare, particularly the ability to enable others to hear recordings of the sounds, words and songs described in his research papers, spoken and sung by the native speakers of Shilluk, with whom he works during his field research in South Sudan.

Reflections on IDCC 2015: How the 80/20 rule applies to Research Data Management tools

RDM Tools need to be usable by a defined and significant set of researchers

A key question for universities considering which RDM tools to adopt is how broadly useful each tool is and how widely it will be taken up by researchers across the institution. I was reminded of this when Robin Rice posed the question to Ian MacArdle and Torsten Reimer of Imperial College after listening to their presentation, ‘Green Shoots:  RDM Pilot at Imperial College London’.

This is equally an issue for builders of tools.  Some tools are designed to be used primarily by the people who designed them, to add value to their own research.  Most of the six projects described in the Green Shoots presentation fall into that category.  Other tools are designed to be used by a broader but still limited community, e.g. tools for researchers in a particular discipline.  RDM tools, in contrast, need to be applicable to an even broader range of researchers, certainly across more than one discipline.  I think the 80/20 rule is a good benchmark.  For an RDM tool to be of interest at the institutional level it needs to be relevant to the needs of at least 80% of a defined and significant set of researchers at the institution.

Electronic lab notebook example

To illustrate this point let’s take the example of electronic lab notebooks.  The target user group here includes biologists, chemists, biomedical researchers, and researchers in associated disciplines such as veterinary medicine, plant science, and food science.  In a word, disciplines where the paper lab notebook is widely used.  This is clearly a defined and significant set of researchers at major research institutions, so the target user group meets the minimum viable community definition.

What does an electronic lab notebook need to have and to do in order to meet the requirement of 80% of the users in this defined target group?  One way of approaching this issue is to look at the reactions of ‘outliers’, i.e.  researchers whose focus is so specific that a ‘generic’ offering, even one tailored to this particular set of researchers, may not be attractive.  A large set of potential outliers is chemists.  If there is no support for chemistry in an ELN chemists are likely to find it unsuitable for their needs.  So one challenge is to build in sufficient support for chemists without having the ELN become so chemistry-focussed that other segments of the target user group find it unworkable.  When this is done in most cases the majority of chemists go from being potential outliers to proponents of adoption.

A second set of outliers are bioinformaticians, who overlap with another set of outliers, researchers who build their own software as part of their research.  Both of these groups tend to prefer ‘home-built’ solutions, and for this reason may not be inclined to adopt ELNs that in fact work well for” the 80%”.  One way of increasing the likelihood that some members of these groups will make use of the ELN and hence become proponents is to build a good API.  This enables these groups to integrate the ELN with the software and solutions they are developing, in which case they are more likely to (a) appreciate the ELN’s benefits, and (b) find that it adds value to their workflow.

A third set of outliers are those who are reluctant to adopt an ELN because for whatever reason they do not see it fitting into their workflow.  In many cases this is a temporary barrier which can be overcome as the doubters come to understand the benefits of using an ELN.  In other cases there is a specific issue relating to the use of a paper notebook.  For example, some researchers work in labs where chemical spillage is likely.  They are reluctant to bring their own tablet computer into the controlled area because they are afraid it will be damaged — they will only adopt the ELN, for use on a tablet, if the lab provides a dedicated tablet for them to use in the lab.

Applying the 80/20 rule

Sometimes the doubts raised by outliers like the ones identified above take on a higher profile than the views of the silent majority of researchers who may be willing or even keen to try out and eventually adopt an ELN, but lack effective or well-connected advocates.  In this case the 80/20 rule may come in handy for those involved in RDM – take some soundings from the ‘average’ labs which don’t fall into any of the three categories of outliers noted above.  If there is significant interest among this group, that could be an indication that a product trial is justified, and that, notwithstanding doubts expressed from some quarters, the 80/20 rule is in fact in operation.

Making the right RDM tools available is a joint responsibility

Reflecting on the above, it’s clear that neither developers of RDM tools or those involved in RDM management can, working in isolation, provide the RDM tools that researchers need.  It’s up to tool providers to, first, design tools for the 80%, and then to be sensitive to those with specialist needs and attempt to build in capabilities that will make the tool useful to as many as possible of them, too.  And it’s up to those involved in RDM management to be attentive to the needs of the ‘silent majority’ of researchers and to work with them, and with interested tools providers, to offer researchers tools that meet their needs, even when those needs may not initially be systematically or even very coherently expressed.

 

Leading a Digital Curation ‘Lifestyle’: First day reflections on IDCC15

[First published on the DCC Blog, republished here with permission.]

Okay that title is a joke, but an apt one to name a brief reflection of this year’s International Digital Curation Conference in London this week, with the theme of looking ten years back and ten years forward since the UK Digital Curation Centre was founded.

The joke references an alleged written or spoken mistake someone made in referring to the Digital Curation lifecycle model, gleefully repeated on the conference tweetstream (#idcc15). The model itself, as with all great reference works, both builds on prior work and was a product of its time – helping to add to the DCC’s authority within and beyond the UK where people were casting about for common language and understanding in this new terrain of digital preservation, data curation, and – a perplexing combination of terms which perhaps still hasn’t quite taken off, ‘digital curation’ (at least not to the same extent as ‘research data management’). I still have my mouse-mat of the model and live with regrets it was never made into a frisbee.

Digital Curation Lifecycle

The Digital Curation Lifecycle Model, Sarah Higgins & DCC, 2008

They say about Woodstock that ‘if you remember it you weren’t really there’, so I don’t feel too bad that it took Tony Hey’s coherent opening plenary talk to remind me of where we started way back in 2004 when a small band under the directorship of Peter Burnhill (services) and Peter Buneman (research) set up the DCC with generous funding from Jisc and EPSRC. Former director Chris Rusbridge likes to talk about ‘standing on the shoulders of giants’ when describing long-term preservation, and Tony reminded us of the important, immediate predecessors of the UK e-Science Programme and the ground-breaking government investment in the Australian National Data Service (ANDS) that was already changing a lot of people’s lifestyles, behaviours and outlooks.

Traditionally the conference has a unique format that focuses on invited panels and talks on the first day, with peer-reviewed research and practice papers on the second, interspersed with demos and posters of cutting edge projects, followed by workshops in the same week. So whilst I always welcome the erudite words of the first day’s contributors, at times there can be a sense of, ‘Wait – haven’t things moved on from there already?’ So it was with the protracted focus on academic libraries and the rallying cries of the need for them to rise to the ‘new’ challenges during the first panel session chaired by Edinburgh’s Geoffrey Boulton, focused ostensibly on international comparisons. Librarians – making up only part of the diverse audience – were asking each other during the break and on twitter, isn’t that exactly what they have been doing in recent years, since for example, the NSF requirements in the States and the RCUK and especially EPSRC rules in the UK, for data management planning and data sharing? Certainly the education and skills of data curators as taught in iSchools (formerly Library Schools) has been a mainstay of IDCC topics in recent years, this one being no exception.

But has anything really changed significantly, either in libraries or more importantly across academia since digital curation entered the namespace a decade ago? This was the focus of a panel led by the proudly impatient Carly Strasser, who has no time for ‘slow’ culture change, and provocatively assumes ‘we’ must be doing something wrong. She may be right, but the panel was divided. Tim DiLauro observed that some disciplines are going fast and some are going slow – depending on whether technology is helping them get the business of research done. And even within disciplines there are vast differences –-perhaps proving the adage that ‘the future is here, it’s just not distributed yet’.

panel session

Carly Strasser’s Panel Session, IDCC15

Geoffrey Bilder spoke of tipping points by looking at how recently DOIs (Digital Object Identifiers, used in journal publishing) meant nothing to researchers and how they have since caught on like wildfire. He also pointed blame at the funding system which focuses on short-term projects and forces researchers to disguise their research bids as infrastructure bids – something they rightly don’t care that much about in itself. My own view is that we’re lacking a killer app, probably because it’s not easy to make sustainable and robust digital curation activity affordable and time-rewarding, never mind profitable. (Tim almost said this with his comparison of smartphone adoption). Only time will tell if one of the conference sponsors proves me wrong with its preservation product for institutions, Rosetta.

It took long-time friend of the DCC Clifford Lynch to remind us in the closing summary (day 1) of exactly where it was we wanted to get to, a world of useful, accessible and reproducible research that is efficiently solving humanity’s problems (not his words). Echoing Carly’s question, he admitted bafflement that big changes in scholarly communication always seem to be another five years away, deducing that perhaps the changes won’t be coming from the publishers after all. As ever, he shone a light on sticking points, such as the orthogonal push for human subject data protection, calling for ‘nuanced conversations at scale’ to resolve issues of data availability and access to such datasets.

Perhaps the UK and Scotland in particular are ahead in driving such conversations forward; researchers at the University of Edinburgh co-authored a report two years ago for the government on “Public Acceptability of Data Sharing Between the Public, Private and Third Sectors for Research Purposes,” as a pre-cursor to innovations in providing researchers with secure access to individual National Health Service records linked to other forms of administrative data when informed consent is not possible to achieve.

Given the weight of this societal and moral barrier to data sharing, and the spread of topics over the last 10 years of conferences, I quite agree with Laurence Horton, one of the panelists, who said that the DCC should give a particular focus to the Social Sciences at next year’s conference.

Robin Rice
Data Librarian (and former Project Coordinator, DCC)
University of Edinburgh