Highlights from the RDM Programme Progress Report: Jan – Feb 2015

The Library and University Collections (L&UC) in association with project partner Manchester University received funding from the Jisc “Research Data Spring” programme to define and develop an open source Data Vault application which will allow data creators to describe and store data safely in one of the growing number of archival storage options. Phase 1 of the project started in March 2015.

The University of Edinburgh (UoE) were invited to contribute to a series of EPSRC (Engineering and Physical Sciences Research Council) Compliance Case Studies. Stuart MacDonald, RDM Service Coordinator, was interviewed by Jisc and the DCC in relation to the RDM programme and institutional compliancy with forthcoming EPSRC research data expectations. The case study will be published on the Jisc website in May 2015.

RDM Service Coordinator Stuart MacDonald co-presented with Rory Macneil (RSpace) their practice paper “Service Integration to Enhance RDM: RSpace electronic laboratory notebook (ELN) case study” at the International Conference on Digital Curation (IDCC) in London (Feb 2015). The paper has been published in the International Journal of Digital Curation (http://www.ijdc.net/index.php/ijdc/article/view/10.1.163), open access.

The RDM Service Coordinator also presented on ‘RDM Training Initiatives @ Edinburgh’ at the “Comparing Notes: Training Librarians for Research Data Management and Open Science Support” workshop at IDCC.

An EPSRC Expectations Awareness Survey was sent out to 98 EPSRC grant holders of which 38 responded. 9** grant holders agreed to participate in a follow-up interview. The findings of the interviews will follow shortly. Dr Evamaria Krause (Marburgh University, Germany) completed a 6 week internship with L&UC where she assisted with the EPSRC Expectations Awareness Survey and EPSRC grant holder interview exercises.

All Schools in the College of Humanities and Social Science (CHSS) have now added links to RDM Programme website and other RDM pages via their intranets. RDM Project Plan deadlines and deliverables which underpin the RDM Roadmap have been updated.* For more details visit the RDM Programme wiki (some content only available to UoE staff).

Four tailored Data Management Plans sessions have been organised with research groups in the College of Medicine and Veterinary Medicine and CHSS, and two workshops for the European Association for Health Information and Libraries (EAHIL) conference in Edinburgh are scheduled to run in June 2015.

Edinburgh DataShare release 1.71 has been announced with new features including faceted browsing, SOLR usage statistics, size limit on assisted deposit of items increased from 5Gb to 10Gb.

DataSync (a Dropbox-like service in development) was themed and made available for beta testing to Information Services colleagues.

Links:

* IT Infrastructure input pending
** 1 PhD student who was forwarded the survey agreed to be interviewed

Stuart Macdonald
RDM Service Coordinator

Data Vault project kickoff meeting

Last week, members of the Data Vault project got together for the kickoff meeting.  Hosted at the University of Manchester Library, we were able to discuss the project plan, milestones for the three month project, agreed terminology for parts of the system, and started to assign tasks to project members for the first month.

Being only three months long, the project is being run in three one-month chunks. These are defined as follows:

  1. Month 1: Define and Investigate: This phase will allow us to agree what the Data Vault should do, and how it does it,  Specifically it will look at:
    1. What are the use cases for the Data Vault
    2. How do we describe the system (create overview diagrams)
    3. How should the data be packed (metadata + data) for long term archival storage
    4. Develop example workflows for how the Data Vault could be used in the research process
    5. Examine the capabilities of archival storage systems to ensure they can support the proposed Data Vault
  2. Month 2: Requirements and Design: This phase will create the requirements specification and initial design of the system:
    1. Define the requirements specification
    2. Use the requirement specification to design the Data Vault system
  3. Month 3: Develop a Proof of Concept: This phase will seek to develop a minimal proof of concept that demonstrates the concept of the Data Vault:
    1. Deliver a working proof of concept that can describe and archive some data, and then retrieve it

At the end of month three, we will prepare for the second Jisc Data Spring sandpit workshop where we will seek to extend the project to take the prototype and develop it into a full system.

All of this is being documented in the project plan, which is a ‘living document’ that is constantly evolving as the project progresses.  The plan is online as a Google Document:

Look out for further blog posts during the month as we undertake the definitions and investigations!

Kickoff meeting

Originally posted on the Jisc Data Vault blog, April 7, 2015 by Stuart Lewis, Deputy Director, Library & University Collections.

New data analysis and visualisation service

Statistical Analysis without Statistical Software

The Data Library now has an SDA server (Survey Documentation and Analysis), and is ready to load numeric data files for access by either University of Edinburgh users only, or ‘the world’. The University of Edinburgh SDA server is available at: http://stats.datalib.edina.ac.uk/sda/

SDA provides an interactive interface, allowing extensive data analysis with significance tests. It also offers the ability to download user-defined subsets with syntax files for further analysis on your platform of choice.

SDA can be used to teach statistics, in the classroom or via distance-learning, without having to teach syntax. It will support most statistical techniques taught in the first year or two of applied statistics. There is no need for expensive statistical packages, or long learning curves. SDA has been awarded the American Political Science Association Best Instructional Software.

For data producers concerned about disclosure control, SDA provides the capability of defining usage restrictions on a variable-by-variable basis. For example, restrictions on minimum cell sizes (weighted or unweighted), use of particular variables without being collapsed (recoded), or restrictions on particular bi- or multivariate combinations.

For data managers and those concerned about data preservation, SDA can be used to store data files in a generic, non-software dependant format (fixed-field format ASCII), and includes capability of producing the accompanying metadata in the emerging DDI-standard XML format.

Data Library staff can mount data files very quickly if they are well documented with appropriate metadata formats (eg SAS or SPSS), depending on access restrictions appertaining to the datafile. To request a datafile be made available in SDA, contact datalib@ed.ac.uk.

Laine Ruus
EDINA and Data Library

open.ed report

Quote

Lorna M. Campbell, a Digital Education Manager with EDINA and the University of Edinburgh, writes about the ideas shared and discussed at the open.ed event this week.

 

Earlier this week I was invited by Ewan Klein and Melissa Highton to speak at Open.Ed, an event focused on Open Knowledge at the University of Edinburgh.  A storify of the event is available here: Open.Ed – Open Knowledge at the University of Edinburgh.

“Open Knowledge encompasses a range of concepts and activities, including open educational resources, open science, open access, open data, open design, open governance and open development.”

 – Ewan Klein

Ewan set the benchmark for the day by reminding us that open data is only open by virtue of having an open licence such as CC0, CC BY, CC SA. CC Non Commercial should not be regarded as an open licence as it restricts use.  Melissa expanded on this theme, suggesting that there must be an element of rigour around definitions of openness and the use of open licences. There is a reputational risk to the institution if we’re vague about copyright and not clear about what we mean by open. Melissa also reminded us not to forget open education in discussions about open knowledge, open data and open access. Edinburgh has a long tradition of openness, as evidenced by the Edinburgh Settlement, but we need a strong institutional vision for OER, backed up by developments such as the Scottish Open Education Declaration.

open_ed_melissa

I followed Melissa, providing a very brief introduction to Open Scotland and the Scottish Open Education Declaration, before changing tack to talk about open access to cultural heritage data and its value to open education. This isn’t a topic I usually talk about, but with a background in archaeology and an active interest in digital humanities and historical research, it’s an area that’s very close to my heart. As a short case study I used the example of Edinburgh University’s excavations at Loch na Berie broch on the Isle of Lewis, which I worked on in the late 1980s. Although the site has been extensively published, it’s not immediately obvious how to access the excavation archive. I’m sure it’s preserved somewhere, possibly within the university, perhaps at RCAHMS, or maybe at the National Museum of Scotland. Where ever it is, it’s not openly available, which is a shame, because if I was teaching a course on the North Atlantic Iron Age there is some data form the excavation that I might want to share with students. This is no reflection on the directors of the fieldwork project, it’s just one small example of how greater access to cultural heritage data would benefit open education. I also flagged up a rather frightening blog post, Dennis the Paywall Menace Stalks the Archives,  by Andrew Prescott which highlights the dangers of what can happen if we do not openly licence archival and cultural heritage data – it becomes locked behind commercial paywalls. However there are some excellent examples of open practice in the cultural heritage sector, such as the National Portrait Gallery’s clearly licensed digital collections and the work of the British Library Labs. However openness comes at a cost and we need to make greater efforts to explore new business and funding models to ensure that our digital cultural heritage is openly available to us all.

Ally Crockford, Wikimedian in Residence at the National Library of Scotland, spoke about the hugely successful Women, Science and Scottish History editathon recently held at the university. However she noted that as members of the university we are in a privileged position in that enables us to use non-open resources (books, journal articles, databases, artefacts) to create open knowledge. Furthermore, with Wikpedia’s push to cite published references, there is a danger of replicating existing knowledge hierarchies. Ally reminded us that as part of the educated elite, we have a responsibility to open our mindsets to all modes of knowledge creation. Publishing in Wikipedia also provides an opportunity to reimagine feedback in teaching and learning. Feedback should be an open participatory process, and what better way for students to learn this than from editing Wikipedia.

Robin Rice, of EDINA & Data Library, asked the question what does Open Access and Open Data sharing look like? Open Access publications are increasingly becoming the norm, but we’re not quite there yet with open data. It’s not clear if researchers will be cited if they make their data openly available and career rewards are uncertain. However there are huge benefits to opening access to data and citizen science initiatives; public engagement, crowd funding, data gathering and cleaning, and informed citizenry. In addition, social media can play an important role in working openly and transparently.

Robin Rice

James Bednar, talking about computational neuroscience and the problem of reproducibility, picked up this theme, adding that accountability is a big attraction of open data sharing. James recommended using iPython Notebook   for recording and sharing data and computational results and helping to make them reproducible. This promoted Anne-Marie Scott to comment on twitter:

@ammienoot: "Imagine students creating iPython notebooks... and then sharing them as OER #openEd"

“Imagine students creating iPython notebooks… and then sharing them as OER #openEd”

Very cool indeed.

James Stewart spoke about the benefits of crowdsourcing and citizen science.   Despite the buzz words, this is not a new idea, there’s a long tradition of citizens engaging in science. Darwin regularly received reports and data from amateur scientists. Maintaining transparency and openness is currently a big problem for science, but openness and citizen science can help to build trust and quality. James also cited Open Street Map as a good example of building community around crowdsourcing data and citizen science. Crowdsourcing initiatives create a deep sense of community – it’s not just about the science, it’s also about engagement.

open._ed_james

After coffee (accompanied by Tunnocks caramel wafers – I approve!) We had a series of presentations on the student experience and students engagement with open knowledge.

Paul Johnson and Greg Tyler, from the Web, Graphics and Interaction section of IS,  spoke about the necessity of being more open and transparent with institutional data and the importance of providing more open data to encourage students to innovate. Hayden Bell highlighted the importance of having institutional open data directories and urged us to spend less time gathering data and more making something useful from it. Students are the source of authentic experience about being a student – we should use this! Student data hacks are great, but they often have to spend longer getting and parsing the data than doing interesting stuff with it. Steph Hay also spoke about the potential of opening up student data. VLEs inform the student experience; how can we open up this data and engage with students using their own data? Anonymised data from Learn was provided at Smart Data Hack 2015 but students chose not to use it, though it is not clear why.  Finally, Hans Christian Gregersen brought the day to a close with a presentation of Book.ed, one of the winning entries of the Smart Data Hack. Book.ed is an app that uses open data to allow students to book rooms and facilities around the university.

What really struck me about Open.Ed was the breadth of vision and the wide range of open knowledge initiatives scattered across the university.  The value of events like this is that they help to share this vision with fellow colleagues as that’s when the cross fertilisation of ideas really starts to take place.

This report first appeared on Lorna M. Campbell’s blog, Open World:  lornamcampbell.wordpress.com/2015/03/11/open-ed

P.S. another interesting talk came from Bert Remijsen, who spoke of the benefits he has found from publishing his linguistics research data using DataShare, particularly the ability to enable others to hear recordings of the sounds, words and songs described in his research papers, spoken and sung by the native speakers of Shilluk, with whom he works during his field research in South Sudan.