Welcome to the new Research Data Management Service Coordinator: Stuart Macdonald

We welcome Stuart Macdonald to the position of Research Data Management Service Coordinator, as a 1-year secondment for the current post-holder. Stuart will continue the work of developing the research data services provided by Information Services at the University of Edinburgh. Stuart will be working for three quarters of his time on the programme, and the remaining quarter in his current role as Associate Data Librarian for EDINA and the Data Library.

Stuart Macdonald

Stuart has recently returned from a six month secondment at Cornell Institute for Social and Economic Research as Data Services Librarian where he co-ordinated the successful Data Seal of Approval trusted repository application for CISER Data Archive as well as modernized archival process and practice.

When not working as service coordinator, Stuart will be working towards gaining the Data Seal of Approval for DataShare, the University’s open data repository.

On the role of service coordinator, Stuart says “This is a marvellous opportunity to be at the heart of research data management activities here at the University and to continue the great work that has already been put in place”

Dealing with Data – Call For Papers

University of Edinburgh Logo

Dealing with Data Conference 2014
Call for Papers

Date:

Tuesday 26th August 2014, 9am – 1pm
Including a formal launch of the University Research Data Management services by the Principal at 11:30am

Location:

University of Edinburgh (room to be confirmed)

Themes:

Data creation
Data management planning
Data visualisation
Data archiving and sharing
Open Data
Data re-use
Electronic lab books
Data preservation
Software preservation
Non-traditional data types
Data analysis
New requirements for Research Data Management
Data infrastructure
Linked Data

Format:

Presentations will be 20 minutes long, with 10 minutes for questions. Depending on numbers, thematic parallel strands may be used.  Presentations to be aimed at an academic audience, but from a wide range of disciplines.

Call for papers:

A half day conference on the subject of ‘Dealing with Data’ is being run to coincide with the launch of the University of Edinburgh’s Research Data Management services that consists of tools and support to deal with the whole lifecycle of research data, from planning and storage, to sharing and archiving.

The conference invites proposals for presentations from University of Edinburgh researchers on any aspect of the challenges and advances in working with data, particularly research data with novel methods of creating, using, storing, visualising or sharing data.  A list of themes is given above, although proposals that cover any aspect of working with research data are welcome.

Please send proposals (2 sides of A4 max) to Stuart Lewis (stuart.lewis@ed.ac.uk) before Friday 25th July 2014.  Papers will be reviewed and the programme compiled by the 8th August. A PDF version of this Call for Papers is available for printing: DealingwithDataConference2014-cfp

Open data repository – file size analysis

The University of Edinburgh’s open data sharing repository, DataShare, has been running since 2009.  During this time, over 125 items of research data have been published online. This blog post provides a quick overview of the the number, extent, and distribution of file sizes and file types held in the repository.

First, some high level statistics (as at March 2014):

  • Number of items: 125
  • Total number of files: 1946
  • Mean number of files per item: 16
  • Total disk space used by files: 76GB (0.074TB)

DataShare uses the open source DSpace repository platform. As well as stroring the raw data files that are uploaded, it creates derivative files such as thumbnails of images, and plain text versions of text documents such as PDF or Word files, which are then used for full-text indexing.  Of the files held within DataShare, about 80% are the original files, and 20% are derived files (including for example, licence attachments).

filetypes

When considering capacity planning for repositories, it is useful to look at the likely file size of files that may be uploaded.  Often with research data, the assumption is that the file size will be quite large.  Sometimes this can be true, but the next graph shows the distribution of files by file size.  The largest proportion of files are under 1/10th of a megabyte (100KB).  Ignoring these small files, there is a normal distribution peaking at about 100MB.  The largest files are nearer to 2GB, but there are very few of these.

filesizes

Finally, it is interesting to look at the file formats stored in the repository.  Unsurprisingly the largest number of files are plain text, followed by a number of Wave Audio files (from the Dinka Songs collection).  Other common file formats include XML files, ZIP files, and JPEG images.

fileformats

Stuart Lewis
Head of Research and Learning Services, Library & University Collections

Data provided by the DataShare team.

IDCC 2014 – take home thoughts

A few weeks ago I attended the 9th International Digital Curation Conference in San Francisco.  The conference was spread over four days, with two days for workshops, and two for the main conference.  The conference was jointly run by the Digital Curation Centre and the California Digital Library.  Unsurprisingly it was an excellent conference with much debate and discussion about the evolving needs for digital curation of research data.

San Francisco

The main points I took home from the conference were:

Science is changing: Atul Butte gave an inspiring keynote that contained an overview of the ways in which his own work is changing.  In particular he explained how it is now possible to ‘outsource’ parts of the scientific process. The first is the ability to visit a web site to buy tissue samplesfor specific diseases which were previously used for medical tests, but which have now been anonymised and collected rather than being discarded.  Secondly it is also now possible to order mouse trials to be undertaken, again via a web site.  These allow routine activities to be performed more quickly and cheaply.

Big Data: This phrase is often used and means different things to different people.  A nice definition given by Jane Hunter was that curation of big data is hard because of its volume, velocity, variety and veracity.  She followed this up by some good examples where data have been effectively used.

Skills need to be taught: There were several sessions about the role of Information Schools in educating a new breed of information professionals with the skills required to effectively handle the growing requirements of analysing and curating data.  This growth was demonstrated by how we are seeing many more job titles such as data engineer / analyst / steward / journalist.  It was proposed that library degrees should include more technical skills such as programming and data formats.

The Data paper: There was much discussion about the concept of a ‘Data Paper’ – a short journal paper that describes a data set.  It was seen as an important element in raising the profile of the creation of re-usable data sets.  Such papers would be citable and trackable in the same ways as journal papers, and could therefore contribute to esteem indicators.  There was a mix of traditional and new publishers with varying business models for achieving this.  One point that stood out for me was that publishers were not proposing to archive the data, only the associated data paper.  The archiving would need to take place elsewhere.

Tools are improving: I attended a workshop about Data Management in the Cloud, facilitated by Microsoft Research.  They gave a demo of some of the latest features of Excel.  Many of the new features seem to nicely fit into two camps, but equally useful and very powerful to both.  Whether you are looking at data from the perspective of business intelligence or research data analysis, tools such as Excel are now much more than a spreadsheet for adding up numbers.  They can import, manipulate, and display data in many new and powerful ways.

I was also able to present a poster that contains some of the evolving thoughts about data curation systems at the University of Edinburgh: http://dx.doi.org/10.6084/m9.figshare.902835

In his closing reflection of the conference, Clifford Lynch said that we need to understand how much progress we are making with data curation.  It will be interesting to see the progress made and what new issues are being discussed at the conference next year which will be held much closer to home in London.

Stuart Lewis
Head of Research and Learning Services
Library & University Collections, Information Services