US Government Data: Lost and Found

Image of rescue tube with floppy desk and words Data Rescue ProjectActions by the current US Trump administration (and others, including Trump’s first term) have spurred archivists, librarians and activists to archive, capture, collect, crawl, hoard, mirror, preserve, rescue, track and save datasets produced at taxpayer expense and until recently made available on government websites.

For example, just as US federal research into climate change, or even mentioning climate, has been paused and government agencies defunded, so the datasets produced from these activities have been removed from public reach or disappeared. The same is true for health data around vaccine research (National Institutes of Health, Centres for Disease Control and Prevention), human subject data deemed to be furthering EDI – equality, diversity, and inclusion – (USAID), and longitudinal educational data measuring attainment and social mobility (Department of Education). In some cases, as on this US government web page from the National Environmental Satellite, Data, and Information Service of the National Oceanic and Atmospheric Administration (NOAA), there are both items that are being decommissioned and archived, and others that are simply being decommissioned and deleted.

Particular challenges to archiving such data are capturing whole databases from scraping techniques, metadata loss, loss of provenance tracking (audit trail of changes), and the inability to add records or collect further data without massive government investment. Also, isolated efforts mean the data cannot easily be discovered.

Fortunately, the Data Rescue Project is coming to the rescue (along with other initiatives). It is a coordinated effort among data organisations and individuals, including librarians and data professionals. It serves as “a clearinghouse for data rescue-related efforts and data access points for public US governmental data that are currently at risk.” The web page provides a host of pointers to current efforts, resources, a tracker tool, and press coverage – including the New Yorker and Le Monde.

Researchers at University of Edinburgh who find that data they require for their research is being removed from publicly available sites may contact the Research Data Support team to discuss potential actions to take.

Robin Rice
Data Librarian and Head of Research Data Support
Library and University Collections

DataShare spotlight: Debates on slavery and abolition held by student debating societies at the University of Edinburgh, 1765-1870

The best part of my job is looking through the new datasets submitted to DataShare, our open-access data repository. One of the first datasets that gripped me was; Debates on slavery and abolition held by student debating societies at the University of Edinburgh, 1765-1870.

This dataset is really cool because as well as being a valuable resource for future research projects, it’s extremely interesting to read, even as someone who doesn’t know anything about historical research. This readability is what makes humanities dataset submissions so fun to process.

This dataset summarises debates on chattel slavery and abolition by two of the University’s debating societies during roughly the last hundred official years of the Transatlantic slave trade. It includes motions and outcomes of the debates, as well as information about the people participating and the positions they took.

It’s easy to tell ourselves that people in the past caused unimaginable harm because they didn’t know any better. Maybe this impulse is a form of self-preservation, a way to deny our ancestors’ agency to protect them – and ourselves – from blame. The dataset reminds us that even at the height of the slave trade there were many people publicly voicing their opposition. The data give us some insight into how these men understood their own complicity in slavery and their responsibility in upholding or abolishing it.

It’s interesting to see, for example, that some debate outcomes were pro-abolition, but against immediate abolition. Or how a debate on whether it would be sound policy to abolish the African slave trade had a unanimously pro-abolition outcome in 1792, yet full emancipation didn’t come for over forty years.

Sample from table of data

A preview of ‘University of Edinburgh Dialectic Society debates on slavery and abolition, 1792-1870’. From Buck, Simon; Frith, Nicola; Curry, Tommy. (2024). Debates on slavery and abolition held by student debating societies at the University of Edinburgh, 1765-1870 [text]. University of Edinburgh. Institute for Advanced Studies in the Humanities. https://doi.org/10.7488/ds/7841.

The dataset comes from The Decolonised Transformations Project which aims to take a critical look at the University’s complicity and investment in slavery and colonialism, confront the legacies of these choices, and make concrete recommendations to address present-day structural racism. The project is a great example of how Humanities research can be translated into a wide array of resources to maximise its utility and reach. As well as traditional outputs like publications and reports, the research team has published datasets, done podcasts, and held workshops and talks for the wider community.

Decolonised Transformations – Confronting the University’s Legacies of Slavery and Colonialism

The datasets have been downloaded multiple times since they were shared in November, so I’m sure other people are finding this data as interesting as I am.

Evelyn Williams
Research Data Support Assistant

DataShare spotlight: Human MotionLess Dataset (HuMoLs) and the creative potential of research data

For the second installment of the spotlight on DataShare blog posts, I would like to showcase a fascinating item containing videos of people not doing anything!

The dataset in question is titled “Human MotionLess Dataset (HuMoLs)” and was created by researchers Longfei Chan, Muhammad Ahmed Raza and Robert Fisher, who are based in the School of Informatics’ Institute of Perception Action and Behaviour. While on the face of it, people being still might not seem very dynamic, the research behind this dataset is trying to solve a difficult problem with a very useful outcome. Simply put, how do you tell when someone is lying still because they are doing something like sleeping, or if it is because they are unwell or have fallen? The videos in this dataset aim to try and train healthcare monitoring systems to help determine whether it is the former or the latter of these possibilities, with the priority being to uncover any critical medical conditions or to analyse chronic conditions.

A selection of still images taken from the videos in the dataset.

What struck me while reviewing the videos for submission was that beyond the usefulness of these videos to the research project, there was the potential for them to be adapted creatively. The videos have a deliberately “uncanny valley” aspect, due to using AI to deepfake participant’s faces in order to preserve their anonymity. The amusingly odd character to the videos made me imagine them being used in an Adam Curtis documentary, or in an Aphex Twin music video. Possibly even in an Adam Curtis documentary with Aphex Twin music over the top of it.

This raises the fascinating idea that there are rich sources of research data stored in open access repositories that could have a life beyond being reproducible, but could also be reused, repurposed and remixed into creative new pieces, adding value to both the research itself, and to the repositories where the affiliated data is stored. To demonstrate this possibility, I have edited some of the videos together, see below, and set them to music. The piece of music, “Redolescence” by Other Lands, has a dreamlike quality to it which both complements and recontextualises the videos into something beyond their originally intended use.

What other audio and visual materials are there contained in research data repositories waiting to be repurposed in a creative manner? Time (and much more talented creators than me) will tell!

The full dataset can be found on DataShare: Human MotionLess Dataset (HuMoLs)

The paper which the dataset supports: OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare

Permission to use the music featured in this video was kindly granted by Gavin Sutherland, performing here under his artist name of Other Lands. The album which contains this track can be purchased on Bandcamp: Other Lands – Riddle of the Mode