Research Data Alliance – report from the 6th Plenary

The Research Data Alliance or RDA is growing about as fast as the data all around us. It got off the ground in 2012 with the support of major research funders in Europe, the US and Australia and has since grown to over 3,000 members. The latest plenary in Paris set a new registration record of ~700 ‘data folk’ including data scientists, data managers, librarians and policy-makers. The theme was Enterprise Engagement with a focus on Research Data for Climate Change.

Not an ordinary conference

What sets RDA apart from other data-related organisations is not just the size of its gatherings, but its emphasis on making change. Parallel sessions are not filled with individual presentations of research papers, but of collaborative activities that lead to outputs that can be used in the real world. Working groups are approved by governance structures that coalesce around actual problems that cannot be solved by individual organisations but require new top-level approaches. They are required to produce their deliverables and close shop after an 18 month period. Interest groups are allowed to exist longer, but are encouraged to spin off working groups to address changes as they are identified through group discussion.

Hard-working groups

Since 2012, these working groups have produced some impressive deliverables and pilots that if implemented across the Web and across organisations and countries could speed up research and improve reproducibility. They are governed by an elected group of experts, worldwide. Some current active projects are:

  • Data Foundation and Terminology WG: defining harmonised terminology for diverse communities used to their own data ‘language’
  • Data Type Registries WG: building software to implement a DTR that can automatically match up unknown dataset ‘types’ with relevant services or applications (such as a viewer)
  • PID Information Types WG: Creating a single common API for delivering checksums from multiple persistent identifier service providers (DataCite and others)
  • Practical policy WG: building on a previous WG that collected various machine-actionable policies practiced by different data centres and repositories, this group will register the policies to move repository managers to move towards a harmonised set.
  • Scalable Dynamic Data Citation WG: to solve the difficulty of properly citing dynamic data sources, the recommended solution allows users to re-execute a query with the original time stamp and retrieve the original data or to obtain the current version of the data.
  • Data Description Registry Interoperability WG: to solve the problem of scattered datasets across repositories and data registries, the group build Research Data Switchboard linking datasets across platforms.
  • Metadata Standards Directory WG: By guiding researchers towards the metadata standards and tools relevant to their discipline, the directory drives up adoption of those standards, improving the chances of future researchers finding and using the data.

Members of the RDM team have been involved in library and repository-related interest groups and Birds of a Feather groups, where surveys of current practice have circulated.

Not all men at RDA! Dame Wendy Hall from the Web Science Institute leads a Women's Networking Breakfast

Not all men at RDA! Dame Wendy Hall from the Web Science Institute leads a Women’s Networking Breakfast – photo courtesy of @RDA_Europe

RDA and climate change

Climate science was prominent in the 6th RDA plenary. This was not only due to the imminent Paris-based United Nations COP talks, but indeed due to issues of critical importance for the world today. For some years, driven by the climate model inter-comparison work underpinning Intergovernmental Panel on Climate Change (IPCC) reports and the massive datasets from Earth observation climate science has been located at an intersection of high performance computing, big data management, and services to support and stimulate research, commerce, and governmental initiatives.

Assessment of the risks posed by climate change, and strategies for adaptation and mitigation sharpens the need to solve not only the technical problems of bringing together diverse data (social, soil, climate, land-use, commercial,…) but also to address the policy challenges, given the diverse organisations needing to cooperate. This is a domain that builds on services to give access to data, for computation close to data enabled by e-infrastructure (such as EGI), and one that requires ever stronger approaches to brokering these resources and services, to permit their orchestration and integration.

Among initiatives presented in the climate-related sessions were:

  • GEOSS – The GEOSS Common Infrastructure allows the user of Earth observations to access, search and use the data, information, tools and services available through the Global Earth Observation System of Systems
  • Global Agricultural Monitoring (GEOGLAM) initiative in response to the growing calls for improved agricultural information.
  • An RDS group focused on wheat – the volatility in prices, in part driven by climate unpredictability, has become a major concern.
  • The IPSL Mesocentre
  • IS-ENES developing services for climate modelling especially
  • Copernicus, seeking to “support policymakers, business, and citizens with improved environmental information. Copernicus integrates satellite and in-situ data with modeling to provide user-focused information services”
  • CLIPC will provide access to climate datasets, and software and information to assess indicators for climate impact.

Dr. Mike Mineter, School of GeoSciences and Robin Rice, EDINA and Data Library

 

 

Data Visualisation with D3 workshop

Last week I attended the 4th HSS Digital Day of Ideas 2015. Amongst networking and some interesting presentations on the use of digital technologies in humanities research (the two presentations I attended focused on analysis and visualisation of historical records), I attended the hands-on `Data Visualisation with D3′ workshop run by Uta Hinrichs, which I thoroughly enjoyed.

The workshop was a crash course to start visualising data combining d3.js and leaflet.js libraries, with HTML, SVG, and CSS. For this, we needed to have installed a text editor (e.g. Notepad++, TextWrangler) and a server environment for local development (e.g. WAMP, MAMP). With the software installed beforehand, I was ready to script as soon as I got there. We were recommended to use Chrome (or Safari), for it seems to work best for JavaScript, and the developer tools it offers are pretty good.

First, we started with the basics of how the d3.js library and other JavaScript libraries, such as jquery or leaflet, are incorporated into basic HTML pages. D3 is an open source library developed by Mike Bostocks. All the ‘visualisation magic’ happens in the browser, which takes the HTML file and processes the scripts as displayed in the console. The data used in the visualisation is pulled into the console, thus you cannot hide the data.

For this visualisation (D3 Visual Elements), the browser uses the content of the HTML file to call the d3.js library and the data into the console. In this example, the HTML contains a bit of CSS and SVG (Scalable Vector Graphics) element with a d3.js script which pulls data from a CSV file containing the details: author and number of books. The visualisation displays the authors’ names and bars representing the number of books each author has written. The bars change colour and display the number of books when you hover over.

Visualising CSV data with D3 JavaScript library

The second visualisation we worked on was the combination of geo-referenced data and leaflet.js library. Here, we combine the d3.js and leaflet.js libraries to display geographic data from a CSV file. First we ensured the OpenStreetMap loaded, then pulled the CSV data in and last customised the map using a different map tile. We also added data points to the map and pop-up tags.

Visualising CSV data using leaflet JavaScript library

In this 2-hour workshop, Uta Hinrichs managed to give a flavour of the possibilities that JavaScript libraries offer and how ‘relatively easy’ it is to visualise data online.

Workshop links:

Other links:

Rocio von Jungenfeld
EDINA and Data Library

Research Data Spring – blooming great ideas !

The University of Edinburgh have been busy putting ideas together for Jisc’s Research Data Spring project, part of the research at risk co-design challenge area, which aims to find new technical tools, software and service solutions, which will improve researchers’ workflows and the use and management of their data (see: http://researchdata.jiscinvolve.org/wp/2014/11/24/research-data-spring-let-your-ideas-bloom/).

Library and University Collections in collaboration with colleagues from the University of Manchester have submitted an idea to prototype and then develop ann open source data archive application that is technology agnostic and can sit on top of various underlying storage or archive technologies – see: http://researchatrisk.ideascale.com/a/dtd/Develop-a-DataVault/102647-31525)

EDINA & Data Library have submitted two ideas, namely:

A ‘Cloud Work Bench’ to provide researchers in the geospatial domain (GI Scientists, Geomaticians, GIS experts) with the tools, storage and data persistence they require to conduct research without the need to manage the same in a local context that can be fraught with socio-technical barriers that impede the actual research (see: http://researchatrisk.ideascale.com/a/dtd/Cloud-Work-Bench/101899-31525)

An exploration of the use of Mozilla Open Badges as certification of completion of MANTRA (Research Data Management Training), a well-regarded open educational resource (see: http://researchatrisk.ideascale.com/a/dtd/Open-Badges-for-MANTRA-resource/102084-31525)

Please register with ideascale (http://researchatrisk.ideascale.com/) and VOTE for our blooming great ideas!!

Stuart Macdonald
RDM Service Coordinator

Data journals – an open data story

Here at the Data Library we have been thinking about how we can encourage our researchers who deposit their research data in DataShare to also submit these for peer review.

Why? We hope the impact of the research can be enhanced with the recognised added-value of peer review. Regardless whether there is a full-blown article to accompany the data.

We therefore decided recently to provide our depositors with a list of websites or organisations where they could do this.

I pulled a table together, from colleagues’ suggestions, from the PREPARDE project and the latest RDM textbook. And, very much in the Open Data spirit, I then threw the question open on Twitter:

“[..]does anyone have an up-to-date list of journals providin peer review of datasets (without articles), other than PREPARDE? #opendata

…and published the draft list for others to check or make comments on. This turned out to be a good move. The response from the Research Data Management community on Twitter was very heartening, and colleagues from across the globe provided some excellent enhancements for the list.

That process has given us confidence to remove the word ‘Draft’ from the title – the list, this crowd-sourced resource, will need to be updated from time-to-time, but we are confident that we’ve achieved reasonable coverage of the things we were looking for.

Another result of this search was the realisation that what we had gathered was in fact quite clearly a list of Data Journals. My colleague Robin Rice has now added a definition of that term to the list, and we will be providing all our depositors with a link to it:

https://www.wiki.ed.ac.uk/display/datashare/Sources+of+dataset+peer+review