Data Vault project kickoff meeting

Last week, members of the Data Vault project got together for the kickoff meeting.  Hosted at the University of Manchester Library, we were able to discuss the project plan, milestones for the three month project, agreed terminology for parts of the system, and started to assign tasks to project members for the first month.

Being only three months long, the project is being run in three one-month chunks. These are defined as follows:

  1. Month 1: Define and Investigate: This phase will allow us to agree what the Data Vault should do, and how it does it,  Specifically it will look at:
    1. What are the use cases for the Data Vault
    2. How do we describe the system (create overview diagrams)
    3. How should the data be packed (metadata + data) for long term archival storage
    4. Develop example workflows for how the Data Vault could be used in the research process
    5. Examine the capabilities of archival storage systems to ensure they can support the proposed Data Vault
  2. Month 2: Requirements and Design: This phase will create the requirements specification and initial design of the system:
    1. Define the requirements specification
    2. Use the requirement specification to design the Data Vault system
  3. Month 3: Develop a Proof of Concept: This phase will seek to develop a minimal proof of concept that demonstrates the concept of the Data Vault:
    1. Deliver a working proof of concept that can describe and archive some data, and then retrieve it

At the end of month three, we will prepare for the second Jisc Data Spring sandpit workshop where we will seek to extend the project to take the prototype and develop it into a full system.

All of this is being documented in the project plan, which is a ‘living document’ that is constantly evolving as the project progresses.  The plan is online as a Google Document:

Look out for further blog posts during the month as we undertake the definitions and investigations!

Kickoff meeting

Originally posted on the Jisc Data Vault blog, April 7, 2015 by Stuart Lewis, Deputy Director, Library & University Collections.

New data analysis and visualisation service

Statistical Analysis without Statistical Software

The Data Library now has an SDA server (Survey Documentation and Analysis), and is ready to load numeric data files for access by either University of Edinburgh users only, or ‘the world’. The University of Edinburgh SDA server is available at: http://stats.datalib.edina.ac.uk/sda/

SDA provides an interactive interface, allowing extensive data analysis with significance tests. It also offers the ability to download user-defined subsets with syntax files for further analysis on your platform of choice.

SDA can be used to teach statistics, in the classroom or via distance-learning, without having to teach syntax. It will support most statistical techniques taught in the first year or two of applied statistics. There is no need for expensive statistical packages, or long learning curves. SDA has been awarded the American Political Science Association Best Instructional Software.

For data producers concerned about disclosure control, SDA provides the capability of defining usage restrictions on a variable-by-variable basis. For example, restrictions on minimum cell sizes (weighted or unweighted), use of particular variables without being collapsed (recoded), or restrictions on particular bi- or multivariate combinations.

For data managers and those concerned about data preservation, SDA can be used to store data files in a generic, non-software dependant format (fixed-field format ASCII), and includes capability of producing the accompanying metadata in the emerging DDI-standard XML format.

Data Library staff can mount data files very quickly if they are well documented with appropriate metadata formats (eg SAS or SPSS), depending on access restrictions appertaining to the datafile. To request a datafile be made available in SDA, contact datalib@ed.ac.uk.

Laine Ruus
EDINA and Data Library

open.ed report

Quote

Lorna M. Campbell, a Digital Education Manager with EDINA and the University of Edinburgh, writes about the ideas shared and discussed at the open.ed event this week.

 

Earlier this week I was invited by Ewan Klein and Melissa Highton to speak at Open.Ed, an event focused on Open Knowledge at the University of Edinburgh.  A storify of the event is available here: Open.Ed – Open Knowledge at the University of Edinburgh.

“Open Knowledge encompasses a range of concepts and activities, including open educational resources, open science, open access, open data, open design, open governance and open development.”

 – Ewan Klein

Ewan set the benchmark for the day by reminding us that open data is only open by virtue of having an open licence such as CC0, CC BY, CC SA. CC Non Commercial should not be regarded as an open licence as it restricts use.  Melissa expanded on this theme, suggesting that there must be an element of rigour around definitions of openness and the use of open licences. There is a reputational risk to the institution if we’re vague about copyright and not clear about what we mean by open. Melissa also reminded us not to forget open education in discussions about open knowledge, open data and open access. Edinburgh has a long tradition of openness, as evidenced by the Edinburgh Settlement, but we need a strong institutional vision for OER, backed up by developments such as the Scottish Open Education Declaration.

open_ed_melissa

I followed Melissa, providing a very brief introduction to Open Scotland and the Scottish Open Education Declaration, before changing tack to talk about open access to cultural heritage data and its value to open education. This isn’t a topic I usually talk about, but with a background in archaeology and an active interest in digital humanities and historical research, it’s an area that’s very close to my heart. As a short case study I used the example of Edinburgh University’s excavations at Loch na Berie broch on the Isle of Lewis, which I worked on in the late 1980s. Although the site has been extensively published, it’s not immediately obvious how to access the excavation archive. I’m sure it’s preserved somewhere, possibly within the university, perhaps at RCAHMS, or maybe at the National Museum of Scotland. Where ever it is, it’s not openly available, which is a shame, because if I was teaching a course on the North Atlantic Iron Age there is some data form the excavation that I might want to share with students. This is no reflection on the directors of the fieldwork project, it’s just one small example of how greater access to cultural heritage data would benefit open education. I also flagged up a rather frightening blog post, Dennis the Paywall Menace Stalks the Archives,  by Andrew Prescott which highlights the dangers of what can happen if we do not openly licence archival and cultural heritage data – it becomes locked behind commercial paywalls. However there are some excellent examples of open practice in the cultural heritage sector, such as the National Portrait Gallery’s clearly licensed digital collections and the work of the British Library Labs. However openness comes at a cost and we need to make greater efforts to explore new business and funding models to ensure that our digital cultural heritage is openly available to us all.

Ally Crockford, Wikimedian in Residence at the National Library of Scotland, spoke about the hugely successful Women, Science and Scottish History editathon recently held at the university. However she noted that as members of the university we are in a privileged position in that enables us to use non-open resources (books, journal articles, databases, artefacts) to create open knowledge. Furthermore, with Wikpedia’s push to cite published references, there is a danger of replicating existing knowledge hierarchies. Ally reminded us that as part of the educated elite, we have a responsibility to open our mindsets to all modes of knowledge creation. Publishing in Wikipedia also provides an opportunity to reimagine feedback in teaching and learning. Feedback should be an open participatory process, and what better way for students to learn this than from editing Wikipedia.

Robin Rice, of EDINA & Data Library, asked the question what does Open Access and Open Data sharing look like? Open Access publications are increasingly becoming the norm, but we’re not quite there yet with open data. It’s not clear if researchers will be cited if they make their data openly available and career rewards are uncertain. However there are huge benefits to opening access to data and citizen science initiatives; public engagement, crowd funding, data gathering and cleaning, and informed citizenry. In addition, social media can play an important role in working openly and transparently.

Robin Rice

James Bednar, talking about computational neuroscience and the problem of reproducibility, picked up this theme, adding that accountability is a big attraction of open data sharing. James recommended using iPython Notebook   for recording and sharing data and computational results and helping to make them reproducible. This promoted Anne-Marie Scott to comment on twitter:

@ammienoot: "Imagine students creating iPython notebooks... and then sharing them as OER #openEd"

“Imagine students creating iPython notebooks… and then sharing them as OER #openEd”

Very cool indeed.

James Stewart spoke about the benefits of crowdsourcing and citizen science.   Despite the buzz words, this is not a new idea, there’s a long tradition of citizens engaging in science. Darwin regularly received reports and data from amateur scientists. Maintaining transparency and openness is currently a big problem for science, but openness and citizen science can help to build trust and quality. James also cited Open Street Map as a good example of building community around crowdsourcing data and citizen science. Crowdsourcing initiatives create a deep sense of community – it’s not just about the science, it’s also about engagement.

open._ed_james

After coffee (accompanied by Tunnocks caramel wafers – I approve!) We had a series of presentations on the student experience and students engagement with open knowledge.

Paul Johnson and Greg Tyler, from the Web, Graphics and Interaction section of IS,  spoke about the necessity of being more open and transparent with institutional data and the importance of providing more open data to encourage students to innovate. Hayden Bell highlighted the importance of having institutional open data directories and urged us to spend less time gathering data and more making something useful from it. Students are the source of authentic experience about being a student – we should use this! Student data hacks are great, but they often have to spend longer getting and parsing the data than doing interesting stuff with it. Steph Hay also spoke about the potential of opening up student data. VLEs inform the student experience; how can we open up this data and engage with students using their own data? Anonymised data from Learn was provided at Smart Data Hack 2015 but students chose not to use it, though it is not clear why.  Finally, Hans Christian Gregersen brought the day to a close with a presentation of Book.ed, one of the winning entries of the Smart Data Hack. Book.ed is an app that uses open data to allow students to book rooms and facilities around the university.

What really struck me about Open.Ed was the breadth of vision and the wide range of open knowledge initiatives scattered across the university.  The value of events like this is that they help to share this vision with fellow colleagues as that’s when the cross fertilisation of ideas really starts to take place.

This report first appeared on Lorna M. Campbell’s blog, Open World:  lornamcampbell.wordpress.com/2015/03/11/open-ed

P.S. another interesting talk came from Bert Remijsen, who spoke of the benefits he has found from publishing his linguistics research data using DataShare, particularly the ability to enable others to hear recordings of the sounds, words and songs described in his research papers, spoken and sung by the native speakers of Shilluk, with whom he works during his field research in South Sudan.

EPSRC Expectations Awareness Survey

As many of you will already know EPSRC set out its research data management (RDM) expectations for institutions in receipt of EPSRC grant funding in May 2011, this included the development of an institutional ‘Roadmap’. EPSRC assessment of compliance with these expectations will begin on 1 May 2015 for research outputs published on or after that date.

In order to comply with EPSRC expectations and to implement the University’s RDM Policy, the University of Edinburgh has invested significantly in RDM services, infrastructure (incl. storage and security) and support as detailed in the University of Edinburgh’s RDM Roadmap.

In an effort to gauge the University of Edinburgh’s ‘readiness’ in relation to EPSRC’s RDM expectations, we are conducting a short survey of EPSRC grant holders.

The survey aims to find out more about researcher awareness of those expectations concerning the management and provision of access to EPSRC-funded research data as detailed in the EPSRC Policy Framework on Research Data.

We aim to conduct follow-up interviews with EPSRC grant holders who are willing to talk through these issues in a bit more detail to help shape the development of the RDM services at the University of Edinburgh.

We will endeavour to make available some of our findings shortly. In the meantime, if you want to use or refer to our survey we have posted a ‘demo’version below:
https://edinburgh.onlinesurveys.ac.uk/epsrc-expectations-awareness-demo

Should you decide to make use of our survey, let us know, as we can potentially share our data with each other to benchmark our progress.

(As an aside Oxford University have crafted a useful data decision tree for EPSRC-funded researchers at Oxford)

Regards
Stuart Macdonald
RDM Services Coordinator
stuart.macdonald@ed.ac.uk

Upadate: A link to the findings can be found at: https://libraryblogs.is.ed.ac.uk/datablog/files/2016/07/EPSRC-RDM-Expectations-Awareness-Survey-Findings.pdf