Dealing with Data 2015 – Presentations now available

On the 31st August, over one hundred researchers from across the breadth of the University of Edinburgh met together in the Informatics Forum to discuss the challenges of dealing with the research data. Following-on from the 2014 conference of the same title, the event consisted of twenty presentations on this subject.

The event was opened by a keynote given by Prof Jonathan Silvertown, talking about his experiences of using crowd-sourced data from citizen scientists, and how to build mechanisms to ensure the quality of the data.

The rest of the day was filled with presentations addressing a wide range of data challenges, including topics such as data from the Large Hadron Collider, working with large data sets from China, and data derived from social media. At the end of the event, the topics were pulled together in a closing talk by Kevin Ashley, Director of the Digital Curation Centre.

If you attended Dealing with Data 2015, and have not already done so, could you please complete our brief survey at DwD2015 Feedback. It should only take 5 minutes and will help us to improve future events.

Programme with presentations

10:00 Welcome. Download PDF
10:05 Opening keynote: The Alchemy of Volunteered Data: turning base metal into gold, Prof Jonathan Silvertown, Institute of Evolutionary Biology. Download PDF

Session 1 – Informatics Forum
10:45 – 11:05: University data, open data and the Smart Data Hack, Ewan Klein, Informatics.
11:05 – 11:25: Edinburgh Data Science and Managing National Data Services at Edinburgh. Mark Parsons, EPCC. Download PDF
11:25 – 11:45: Channel shift – using data analysis to improve service delivery at the City of Edinburgh Council. Michal Wasilewski, Informatics. Download PDF

Session 2 – Informatics Forum
12:00 – 12:20: What are the challenges of collecting and analysing data in primary care? Lessons learned from a feasibility study in six general practices in Lothian, Scotland. Natalia Calanzani, Debbie Cavers, Gaby Vojt, David Weller, Christine Campbell, Population Health Sciences and Informatics. Download PDF
12:20 – 12:40: Facilitating the reuse of brain imaging and clinical data from completed studies across the life course: the Brain Images of Normal Subjects (BRAINS) Imagebank. Samuel Danso, Dominic E. Job, David Alexander Dickie, David Rodriguez, Andrew Robson, Cyril Pernet, Susan D. Shenkin, Joanna M. Wardlaw, Brain Sciences. Download PDF
12:40 – 13:00: Scottish Neighbourhood Statistics and R: Adding value to a public data resource with the ‘tidy data’ paradigm. Jon Minton, AQMeN. Download PDF

Session 3 – Appleton Tower
12:00 – 12:20: Data ecosystems and wicked problems; supporting “students as researchers” in complex data environments. Arno Verhoeven, ECA; James Stewart, SPS; Ewan Klein, Informatics. Download PDF
12:20 – 12:40: Networked learning analytics: Studying the association between learner generated discourse and learning. Srećko Joksimović, Dragan Gašević, Education. Download PDF
12:40 – 13:00: Automated Content Analysis of Discussion Transcripts. Vitomir Kovanovic, Dragan Gašević, Informatics and Education. Download PDF

Session 4 – Informatics Forum
13:45 – 14:05: Exploring Digital Divides in China, Ashley Lloyd, Business School; Mario A. Antonioletti, Terence M. Sloan, EPCC.
14:05 – 14:25: Gone Fishing: The Creation of the Comparative Agendas Project Master Codebook, Shaun Bevan, SSPS. Download PDF
14:25 – 14:45: Electronic lab notebooks and research data management at Edinburgh Experience to date and challenges and opportunities going forward. Rory Macneil, RSpace. Download PDF

Session 5 – Appleton Tower
13:45 – 14:05: Tweeting Jonson’s “Foot Voyage”: deeply mapped data, Anna Groundwater, HCA. Download PDF
14:05 – 14:25: University of Edinburgh Reid Concerts Database Project, Fiona Donaldson, Music. Download PDF
14:25 – 14:45: Encountering feminism on Twitter, Prof Viviene Cree, and Dr Steve Kirkwood, Social and Political Science, with Dr Daniel Winterstein, Sodash. Download PDF

Session 6 – Informatics Forum
15:00 – 15:20: The VELaSSCo framework: a software platform for end user analytics and visualization of large simulation datasets, G. Filippone, A. Janda, K.J. Hanley, S. Papanicolopulos and J.Y. Ooi, IIE, Engineering.
15:20 – 15:40: From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider, Andrew Washbrook, Physics. Download PDF
15:40 – 16:00: Tipping the balance – introducing data management on a centre-wide level, Tomasz Zieliński, Eilidh Troup, Andrew Millar, Biology. Download PDF

16:00 Closing talk: Kevin Ashley, Director, Digital Curation Centre
Kerry Miller
RDM Service Coordinator

Dealing with Data 2015 – Programme

Date:                     Monday 31 August 2015, 9:30 – 16:30
Location:             Informatics Forum and Appleton Tower, University of Edinburgh

Bookings for the Dealing with Data conference are now closed.  The event is full!

Draft Programme

09:30 Refreshments
10:00 Welcome
10:05 Opening keynote: The Alchemy of Volunteered Data: turning base metal into gold, Prof Jonathan Silvertown, Institute of Evolutionary Biology.


10:45 Session 1 – Informatics Forum
10:45 – 11:05: University data, open data and the Smart Data Hack, Ewan Klein, Informatics.
11:05 – 11:25: Edinburgh Data Science and Managing National Data Services at Edinburgh. Mark Parsons, EPCC.
11:25 – 11:45: Channel shift – using data analysis to improve service delivery at the City of Edinburgh Council. Michael Wasilewski, Informatics.


11:45 Break


12:00 – 13:00 Session 2 – Informatics Forum
12:00 – 12:20: What are the challenges of collecting and analysing data in primary care? Lessons learned from a feasibility study in six general practices in Lothian, Scotland. Natalia Calanzani, Debbie Cavers, Gaby Vojt, David Weller, Christine Campbell, Population Health Sciences and Informatics.
12:20 – 12:40: Facilitating the reuse of brain imaging and clinical data from completed studies across the life course: the Brain Images of Normal Subjects (BRAINS) Imagebank. Samuel Danso, Dominic E. Job, David Alexander Dickie, David Rodriguez, Andrew Robson, Cyril Pernet, Susan D. Shenkin, Joanna M. Wardlaw, Brain Sciences.
12:40 – 13:00: Scottish Neighbourhood Statistics and R: Adding value to a public data resource with the ‘tidy data’ paradigm. Jon Minton, AQMeN.


12:00 – 13:00 Session 3 – Appleton Tower
12:00 – 12:20: Data ecosystems and wicked problems; supporting “students as researchers” in complex data environments. Arno Verhoeven, ECA; James Stewart, SPS; Ewan Klein, Informatics.
12:20 – 12:40: Networked learning analytics: Studying the association between learner generated discourse and learning. Srećko Joksimović, Dragan Gašević, Education
12:40 – 13:00: Automated Content Analysis of Discussion Transcripts. Vitomir Kovanovic, Dragan Gašević, Informatics and Education.


13:00 Lunch


13:45 – 14:45 Session 4 – Informatics Forum
13:45 – 14:05: Exploring Digital Divides in China, Ashley Lloyd, Business School; Mario A. Antonioletti, Terence M. Sloan, EPCC.
14:05 – 14:25: Gone Fishing: The Creation of the Comparative Agendas Project Master Codebook, Shaun Bevan, SSPS.
14:25 – 14:45: Electronic lab notebooks and research data management at Edinburgh Experience to date and challenges and opportunities going forward. Rory Macneil, RSpace.


13:45 – 14:45Session 5 – Appleton Tower
13:45 – 14:05: Tweeting Jonson’s “Foot Voyage”: deeply mapped data, Anna Groundwater, HCA.
14:05 – 14:25: University of Edinburgh Reid Concerts Database Project, Fiona Donaldson, Music.
14:25 – 14:45: Encountering feminism on Twitter, Prof Viviene Cree, and Dr Steve Kirkwood, Social and Political Science, with Dr Daniel Winterstein, Sodash.


14:45 Break


15:00 – 16:00 Session 6 – Informatics Forum
15:00 – 15:20: The VELaSSCo framework: a software platform for end user analytics and visualization of large simulation datasets, G. Filippone, A. Janda, K.J. Hanley, S. Papanicolopulos and J.Y. Ooi, IIE, Engineering.
15:20 – 15:40: From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider, Andrew Washbrook, Physics.
15:40 – 16:00: Tipping the balance – introducing data management on a centre-wide level, Tomasz Zieliński, Eilidh Troup, Andrew Millar, Biology.


16:00 Closing talk: Kevin Ashley, Director, Digital Curation Centre
16:30 End

Fostering open science in social science

FOSTER_logoOn 10th of June, the Data Library team ran two workshops in association with the EU Horizon 2020 project, FOSTER (Facilitate Open Science Training for European Research), and the Scottish Graduate School of Social Science.

The aim of the morning workshop, “Good practice in data management & data sharing with social research,” was to provide new entrants into the Scottish Graduate School of Social Science with a grounding in research data management using our online interactive training resource MANTRA, which covers good practice in data management and issues associated with data sharing.

The morning started with a brief presentation by Robin Rice on ‘open science’ and its meaning for the social sciences. Pauline Ward then demonstrated the importance of data management plans to ensure work is safeguarded and that data sharing is made possible. I introduced MANTRA briefly, and then Laine Ruus assigned different MANTRA units to participants and asked them to briefly go through the units and extract one or two key messages and report back to the rest of the group. After the coffee break we had another presentation on ethics, informed consent and the barriers for sharing, and we finished the morning session with a ‘Do’s and Dont’s exercise where we asked participants to write in post-it notes the things they remembered, the things they were taking with them from the workshop: green for things they should DO, and pink for those they should NOT. Here are some of the points the learners posted:

DO
– consider your usernames & passwords
– read the Data Protection Act
– check funder/institution regulations/policies
– obtain informed consent
– design a clear consent form
– give participants info about the research
– inform participants of how we will manage data
– confidentiality
– label your data with enough info to retrieve it in future
– develop a data management plan
– follow the certain policies when you re-use dataset[s] created by others
– have a clear data storage plan
– think about how & how long you will store your data
– store data in at least 3 places, in at least 2 separate locations
– backup!
– consider how/where you back up your data
– delete or archive old versions
– data preservation
– keep your data safe and secure with the help of facilities of fund bodies or university
– think about sharing
– consider sharing at all stages. Think about who will use my data next
– share data (responsibly)

DON’T
– unclear informed consent
– a sense of forcing participants to be part of research
– do not store sensitive information unless necessary
– don’t staple consent forms to de-identified data records/store them together
– take information security for granted
– assume all software will be able to handle your data
– don’t assume you will remember stuff. Document your data
– assume people understand
– disclose participants’ identity
– leave computer on
– share confidential data
– leave your laptop on the bus!
– leave your laptop on the train!
– leave your files on a train!
– don’t forget it is not just my data, it is public data
– forget to future proof

Robin Rice presenting at FOSTERing Open Science workshop

Our message was that open science will thrive when researchers:

  • organise and version their data files effectively,
  • provide comprehensive and sufficient documentation for others to understand and replicate results and thus cite the source properly
  • know how to store and transport your data safely and securely (ensuring backup and encryption)
  • understand legal and ethical requirements for managing data about human subjects
  • Recognise the importance of good research data management practice in your own context

The afternoon workshop on “Overcoming obstacles to sharing data about human subjects” built on one of the main themes introduced in the morning, with a large overlap of attendees. The ethical and regulatory issues in this area can appear daunting. However, data created from research with human subjects are valuable, and therefore are worth sharing for all the same reasons as other research data (impact, transparency, validation etc). So it was heartening to find ourselves working with a group of mostly new PhD students, keen to find ways to anonymise, aggregate, or otherwise transform their data appropriately to allow sharing.

Robin Rice introduced the Data Protection Act, as it relates to research with human subjects, and ethical considerations. Naturally, we directed our participants to MANTRA, which has detailed information on the ethical and practical issues, with specific modules on “Data protection, rights & access” and “Sharing, preservation & licensing”. Of course not all data are suitable for sharing, and there are risks to be considered.

In many cases, data can be anonymised effectively, to allow the data to be shared. Richard Welpton from the UK Data Archive shared practical information on anonymisation approaches and tools for ‘statistical disclosure control’, recommending sdcMicroGUI (a graphical interface for carrying out anonymisation techniques, which is an R package, but should require no knowledge of the R language).

DrNiamhMooreFinally Dr Niamh Moore from University of Edinburgh shared her experiences of sharing qualitative data. She spoke about the need to respect the wishes of subjects, her research gathering oral history, and the enthusiasm of many of her human subjects to be named in her research outputs, in a sense to own their own story, their own words.

Links:

Rocio von Jungenfeld & Pauline Ward
EDINA and Data Library

Data Visualisation with D3 workshop

Last week I attended the 4th HSS Digital Day of Ideas 2015. Amongst networking and some interesting presentations on the use of digital technologies in humanities research (the two presentations I attended focused on analysis and visualisation of historical records), I attended the hands-on `Data Visualisation with D3′ workshop run by Uta Hinrichs, which I thoroughly enjoyed.

The workshop was a crash course to start visualising data combining d3.js and leaflet.js libraries, with HTML, SVG, and CSS. For this, we needed to have installed a text editor (e.g. Notepad++, TextWrangler) and a server environment for local development (e.g. WAMP, MAMP). With the software installed beforehand, I was ready to script as soon as I got there. We were recommended to use Chrome (or Safari), for it seems to work best for JavaScript, and the developer tools it offers are pretty good.

First, we started with the basics of how the d3.js library and other JavaScript libraries, such as jquery or leaflet, are incorporated into basic HTML pages. D3 is an open source library developed by Mike Bostocks. All the ‘visualisation magic’ happens in the browser, which takes the HTML file and processes the scripts as displayed in the console. The data used in the visualisation is pulled into the console, thus you cannot hide the data.

For this visualisation (D3 Visual Elements), the browser uses the content of the HTML file to call the d3.js library and the data into the console. In this example, the HTML contains a bit of CSS and SVG (Scalable Vector Graphics) element with a d3.js script which pulls data from a CSV file containing the details: author and number of books. The visualisation displays the authors’ names and bars representing the number of books each author has written. The bars change colour and display the number of books when you hover over.

Visualising CSV data with D3 JavaScript library

The second visualisation we worked on was the combination of geo-referenced data and leaflet.js library. Here, we combine the d3.js and leaflet.js libraries to display geographic data from a CSV file. First we ensured the OpenStreetMap loaded, then pulled the CSV data in and last customised the map using a different map tile. We also added data points to the map and pop-up tags.

Visualising CSV data using leaflet JavaScript library

In this 2-hour workshop, Uta Hinrichs managed to give a flavour of the possibilities that JavaScript libraries offer and how ‘relatively easy’ it is to visualise data online.

Workshop links:

Other links:

Rocio von Jungenfeld
EDINA and Data Library