RDM & Cornell University

I’ve been fortunate to have been given the opportunity to take up a secondment at the Cornell Institute for Social and Economic Research (CISER) as Data Services Librarian, the primary tasks of which are to:

  • Modernise the CISER data archive, and if possible, begin the implementation. Tasks include: introduction of persistent identifiers (DOIs) to all archival datasets (via EZID); investigate metadata mapping of archival datasets (DDI, DC, MARCXML); streamline data catalogue functionality (by introducing result sorting, relevance searches, subject classification), assist scoping a data repository solution for social science data assets generated by Cornell researchers
  • Actively participate in the Research Data Management Services Group at Cornell, assisting researchers with their RDM plans, contributing to the advancement of the work of the group
  • Actively consult with researchers about social science datasets and other data outreach activities.
  • Co-ordinate and collate assessment statements in order to gain Data Seal of Approval for CISER data archive.

Last Friday I gave my first presentation on the CISER data archive along with other CISER colleagues (they talked about datasets used in restriction at the Cornell Restricted Access Data Centre, and the CISER Statistical Consultancy Service & ICPSR) at a Policy and Analysis and Management (PAM) workshop for graduate students. This was held at the Survey Research Institute (https://www.sri.cornell.edu/sri/ ) where much discussion centred around survey non-response and mechanisms to counter this increasingly common phenomenon.

On Tuesday of this week I presented on the University of Edinburgh RDM Roadmap at a meeting of the monthly Research Data Management Service Group (RDMSG – http://data.research.cornell.edu). This was followed by two presentations yesterday, one at a Demography Pro-seminar (for graduate students) on campus and later at a Cornell University Library Data Discussion Group meeting in the Mann Library set up to introduce the CISER Data Services Librarian to a range of subject librarians principally in the social sciences. In each case the Edinburgh RDM Roadmap was received with great enthusiasm and engendered much discussion, in particular the centralised and inclusive approach adopted by Edinburgh. Follow up discussion and meetings are being planned including the potential use of MANTRA and the RDM Toolkit for Librarians as materials to raise the profile of RDM at Cornell.

As an aside, at a CISER team meeting the subject was raised about password protection (in some instances passwords to CISER resources are changed on a very regular basis for security purposes) and issues surrounding inappropriate recording of passwords. A site licence for a software protection software package was seen as a possible solution to both user disgruntlement and possible security breaches. As a thought, this might be worth considering as part of the Active Data Infrastructure tool suite.

Stuart Macdonald
Associate Data Librarian, UoE / Visiting CISER Data Services Librarian

RDM in the arts and humanities

St Annes College, Oxford The tenth Research Data Management Forum (RDMF), organised by DCC was held in St Anne’s College, University of Oxford on 3 and 4 September. Thus follows an account of proceedings, the goals of which were to examine aspects of arts and humanities research data that may require a different kind of handling to that given to other disciplines, and to discuss how needs for support, advocacy, training and infrastructure are being met.

Dave De Roure (Director of the Oxford e-Research Centre) started proceedings as keynote #1. He introduced the concept of the ‘fourth quadrant social machine’ (see Fig. 1) which extends the notion of ‘citizen as scientist’ taking advantage of the emergence of new analytical capabilities, (big) data sources and social networking. He talked also about the ‘End of Theory‘ – the idea that the data deluge is making scientific method obsolete also referencing Douglas Kell’s BioEssay ‘Here is the evidence, now what is the hypothesis‘ which argues that that “data- and technology-driven programmes are not alternatives to hypothesis-led studies in scientific knowledge discovery but are complementary and iterative partners with them.” This he continued, could have major influence on how research is conducted not only in hard sciences but also the arts and humanities.Social Machines Fig. 1 – From ‘Social Machines’ by Dave De Roure on Slideshare (http://www.slideshare.net/davidderoure/social-machinesgss)

He contends that citizen science initiatives such as Zooniverse (real science online) extend the idea of ‘human as slave to the computer’ however stating that the ‘more we give the computer to do the more we have to keep an eye on what they do.’ De Roure talked about sharing methods being as important as sharing data harnessing the web as lens (e.g. Twitterology) onto society, as infrastructure (e-science/e-research), and as the artifact of study. He went on to highlight innovative sharing initiatives in the arts and humanities that employ new analytical approaches such as:

  • the HATHI Trust Research Center which ‘enables computational access for nonprofit and educational users to published works in the public domain now and, in the future’ i.e. take your code to the data and get back your results!
  • CLAROS which uses semantic web technologies and image recognition techniques to make the geographically separate scholarly art datasets ‘interoperable’ and accessible to a broad range of users’
  • SALAMI (Structural Analysis of Large Amounts of Music Information) which focuses on ‘developing and evaluating practices, frameworks, and tools for the design and construction of worldwide distributed digital music archives and libraries’

De Roure used the phrase ‘from signal to understanding’ to describe the workflow of the ‘social machine’ and went on to described the commonalities that the arts and humanities community have with other disciplines such as working with multiple data sources (often incomplete and born digital), sharing of new and innovative digital methods, the challenges of resource discovery and publication of new digital artifacts, the importance of provenance, and risks of increasing automation. He also highlighted those differences that digital resources in arts and humanities possess in relation to other disciplines in the age of the ‘social machine’  such as specific content types and their relationship to physical artifacts, curated collections and an ‘infinite archive’ of heterogeneous content, publication as both subject and record of research, and the emphasis on multiple interpretations and critical thinking.

Keynote #2 was delivered by Leigh Garrett (Director of the Virtual Arts Data Service) who opined that little is really known about the ‘state’ of research data in the visual arts. It is both tangible yet intangible (what is data in the visual arts?)! Both physical and digital, heterogeneous and infinite, complex and complicated. He made mention of the Jisc-funded KAPTUR project which was aimed to create and pilot a sectoral model of best practice in the management of research data in the visual arts.

KAPTURE schematic drawing

As part of the exercise KAPTUR  evaluated 12 technical systems most suited for managing research data in the visual arts including CKAN, Figshare, EPrints, DataFlow. Criteria such as user friendliness, visual engagement, flexibility, hosted solution, licensing, versioning, searching were considered. Figshare was seen as user friendly, visually engaging, intuitive,and flexible however the use of CC Zero licences were seen as inappropriate for visual arts research data due to commercial usage clauses. Whilst CKAN appeared most suited no single solution completely fulfilled all requirements of researchers.

Fig 2. Lucie Rie, Sheet of sketches from Council of Industrial Design correspondence for Festival of Britain, 1951. © Mrs. Yvonne Mayer/Crafts Study Centre.
Available from VADS

Simon Willmoth then gave an institutional perspective from the University of the Arts London. It was interesting that from his experience abstract terms such as research data do not engage researchers, and indeed the term is not in common usage by art and design researchers. His definition in the context of art and design was that research data can be ‘considered anything created, captured or collected as an output of funded research work in its original state’. He also observed that as soon as researchers understand RDM concepts they ask for all of their material (past and present) to be digitised! Echoing earlier presentations regarding the ‘heterogeneous and infinite’ nature of research data in the arts and humanities Simon indicated that artists and designers normally have their own websites, some of the content can be regarded as research data e.g. drawings, storyboards, images, whilst some of it is a particular view dependent upon what it is used for at that instant. He then described the institutional challenges of resourcing (staff, storage, time), researcher engagement, curation (incl. legal, ethical and commercial constraints), infrastructure, and enhanced access. Simon finished with some very interesting quotes from researchers regarding how they perceive their work in relation to RDM e.g.

The work that is made is evidence of the journey to the completed artwork …… it’s kind of a continuous archive of imagery

I try not to throw things out but I often cut things up to use as collage material …. so it’s continual construction and deconstruction. Actually ordering is part of my own creative process, so the whole idea of archiving I think is really interesting

I used to think of a website as something where you display things and now increasingly I see it as a way of it recording the process, so I am using more like a Blog structure. But I am happy to post notes, photographs, drawings, observations and let other people see the process

My sketch books tend to be a mish-mash between logging of rushes notes, detailing each shot, things that I read, things that I hear, books that I’m reading, so it will be a jumble of texts but they’ve all gone in to the making of a piece of work.

 

Image of Alex from A Clockwork Orange at Stanley Kubrick Archive (UAL)

Simon showcased the Stanley Kubrick Archive based at UAL which contains draft and completed scripts, research materials such as books, magazines and location photographs. It also holds set plans and production documents such as call sheets, shooting schedules, continuity reports and continuity Polaroids. Props, costumes, poster designs, sound tapes and records also feature, alongside publicity press cuttings. He argues that the approach of digital copy and physical artifact accompanied by a web catalogue may be the way forward for RDM in the field of art and design.

Julianne Nyhan (UCL) kicked off day two providing a researcher’s view on arts and humanities data management/sharing with particular emphasis on infrastructure needs and wants. She observed that arts and humanities data are artifacts of human expression, interaction and imagination which tends to be collected rather than generated and rarely by those who create the object(s) which can be:

  • complex complete with variants, annotations, editorial comment
  • multi-lingual
  • long lasting
  • studied for many purposes

Julianne also reiterated the need to bridge management and sharing on both physical and digital objects as well as more documentation of interpretative processes and interventions (for which she employs a range of note management tools). She went onto say that much of the work done in her own discipline goes beyond disciplinary/academic/institutional boundaries and that the need to retain and appreciate the bespoke should be balanced against need for standardisation.
In terms of strategic developments Julianne saw much mileage in facilitating more research across linguistic borders (through legal instruments at a national/international level) with resultant access to large multilingual datasets from different cultures to inform comparative and transnational research.

Next up Paul Whitty and Felicity Ford (Oxford Brookes University) provided an overview of RDM practices at the Sonic Art Research Unit (SARU). The use of the internet to advertise work  is common amongst SARU researchers, musicians, freelance artists. It was however recognised that there lacked any unified web presence such as Ubuweb.
UbuwebPaul emphasised the need for the internet to be seen as a social space for SARU researchers. At the moment research is disseminated across multiple private websites without consistent links back to the university. Research objects and documentation is split over multiple platforms (e.g. Soundcloud, Vimeo, YouTube). This makes resource discovery difficult except through individual artists web spaces. As of March 2014 research disseminated across multiple private websites will be linked to/from the university with data stored on RADAR, the multi-purpose online “resource bank” for Oxford Brookes thus enabling traceability and impact measurement in terms of the use of digital research assets. As a concluding remark Paul did question whether university IT departments were qualified to provide bespoke website design for arts practitioners and researchers.

James Wilson, Janet McKnight and Sally Rumsey then gave an overview of the University of Oxford approach to RDM in the humanities which utilises different business models (extensible or reducible) for different components e.g. DataFinder, Databank, DataFlow, DHARMa. Findings from earlier projects at Oxford indicated that humanities research data tends not to depreciate over time unlike that for harder sciences, is difficult to define, tends to be compiled from existing sources and not created from scratch,and is often not in an optimal format for analysis. Other findings indicated that humanities researchers are least likely to conduct their research as part of a team, least likely to be externally funded, least likely to have deposited data in a repository. Conclusions reached were that humanities researchers were amongst the hardest to reach and that training and support is required to encourage cultural change. Janet McKnight from the Digital Humanities Archives for Research Materials (DHARMa) Project then spoke about enabling digital humanities research through effective data preservation warning that before you impose a workflow in terms of developing systems and processes it would be wise to ‘walk a mile in their [the researcher’s] shoes!’

This was a well-attended and enlightening event, ably organised and chaired by Martin Donnelly (DCC). It offered insight and a wide range of perspectives all of which enhance our understanding of service, of practice, of advocacy, of support in relation to research data management in the arts and humanities.

Slides from all presenters are available from DCC website.

Stuart Macdonald
EDINA & Data Library

We have a new MANTRA!

Normal
0

false
false
false

EN-GB
X-NONE
X-NONE

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:””;
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin-top:0cm;
mso-para-margin-right:0cm;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0cm;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:”Calibri”,”sans-serif”;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-fareast-language:EN-US;}

Research Data MANTRA (http://datalib.edina.ac.uk/mantra/) , the free online course hosted at Edinburgh University Data Library and designed for researchers or others planning to manage digital data as part of the research process has been refreshed!

MANTRA Homepage

Normal
0

false
false
false

EN-GB
X-NONE
X-NONE

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:””;
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin-top:0cm;
mso-para-margin-right:0cm;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0cm;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:”Calibri”,”sans-serif”;
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-fareast-language:EN-US;}

Shortlisted recently as one of 15 good practice examples designed to enhance information literacy skills MANTRA has been upgraded to Version 2 of Xerte Online Toolkits, the e-learning development environment used to create the MANTRA learning materials. This allows delivery of the MANTRA units to a much wider range of devices using HTML5 rather than Flash.

The new MANTRA also highlights the utility of the learning materials for 4 discreet personas:

  • Research student
  • Career researcher
  • Senior academic
  • Information professional
3 things you might want to use MANTRA for:
3 things you might want to use MANTRA for:

We hope you enjoy the new MANTRA experience. Please get in contact with us at the Data Library with your comments or suggestions.

Stuart Macdonald
EDINA & Data Library

MANTRA shortlisted

Research Data MANTRA (http://datalib.edina.ac.uk/mantra/) is a free, non-credit course designed for postgraduate students and early career researchers which provides guidelines for good practice in research data management.

Image depicting a shortlist
(Flickr Image by Soilse – CC BY-NC-ND 2.0)

In recognition of the work done by Edinburgh University Data Library in developing this open educational resource, MANTRA has been evaluated and shortlisted in a report by the Research Information Literacy and Digital Scholarship (RILADS) project as one of 15 good practice examples designed to enhance information literacy skills of postgraduate students and early career researchers in UK Higher Education

RILADS aims to investigate and report on support available to students, staff and researchers to enhance digital literacy. There are two strands to the project. One is co-ordinated by Research Information Network (RIN) on behalf of Research Information and Digital Literacies Coalition (RIDLs), the other by SCONUL under the JISC Developing Digital Literacies (DDL) programme.

The RIN strand focuses on the identification and promotion of good practice in information handling and data management training and development across the HE and research sectors. The SCONUL strand aims to identify, harvest, and use materials to progress the development of digital professional expertise.

The full report and shortlist are available on the RILADS website: http://rilads.wordpress.com/2013/06/10/rilads-report/

MANTRA, the sole resource on the shortlist that is dedicated to research data management skills, has also been upgraded to Version 2 of the Xerte Online Toolkit, the e-learning development environment used by MANTRA to create the learning materials. This has the ability to deliver content using HTML5 rather than the Flash Player. This has a number of advantages in that you can deliver content to a much wider range of devices, and specifically you can deliver content to devices that do not support Flash.

Watch out for further MANTRA enhancements!

Stuart Macdonald
EDINA & Data Library