The Library’s project to digitise its entire collection of PhD and doctoral level theses is now entering its final phase, with the team on track to have all 17,000 volumes scanned by May and online by the end of 2018.
To date, the team has digitised 10,385 individual theses out of an internal target of 12,500 – in total, over 2.6 million pages have been scanned, making this the largest digitisation project the Library has ever undertaken. In addition to the work in-house, approximately 4,500 volumes were outsourced to Autodocs, our scanning partner, in 2017.
We have now almost completed the scanning of the 20th and 21st century collections, and the 19th century handwritten theses are due to be digitised by the end of January. The final three months of the project will then be dedicated to digitisation of the older printed Latin volumes – medical theses dating from the mid 18th to mid 19th centuries. A small collection of even earlier theses, dating from as far back as 1599, will be photographed by our colleagues in the Digital Imaging Unit.
After digitisation, the theses are uploaded to the Edinburgh Research Archive (ERA), where they are available to download for free. We have now uploaded over half of the collection to ERA, and by the end of January we will be ahead of our target to have all Edinburgh PhDs online by the end of 2018.
Theses digitised by this project are currently being downloaded over 3,000 times per month, with the most popular to date being The Social differentiation of English in Norwich by Peter Trudgill, which has been accessed almost 350 times since it was added to ERA last year. Other popular titles include Myo-Mint’s Study of the interpersonal dimension of narrative fiction with specific reference to power and control in Muriel Spark’s Memento mori and its implications for the teaching of English literature in a TEFL context (272 downloads) and Ji-Hwan Song’s Business ethics and the corporate manipulation of expressions (256 downloads).
We will be showcasing some of these works in an exhibition that will be running in the CRC at the end of the year. As well as telling the story of the Edinburgh PhD from the earliest 16th century disputations through to the modern, A4, typed and bound thesis, the exhibition will feature examples of interesting authors, unusual topics and highlight some of the more surprising things we have found within the humble PhD volumes.
Giulia’s recent blog post mentioned some of these – we’ve also come across items ranging from the grisly (laminated slices of human lung) to the darkly comedic (a bullet in a thesis which the author had accidentally shot himself with). And that’s not even mentioning the test tubes, vials and envelopes of mysterious white powder that have been unearthed by the team over the last two years…
Now that the project is entering its final phase, we are beginning to discuss how the content might best be used once all digitised theses are online. There is already strong demand for researchers for digital theses but we are keen to explore other ways that we can make use of, and open up, this large data set. In addition to work we’ve already undertaken with uploading a thesis to Wikisource, our Digital Scholarship Developer Mike Bennett is exploring how we can match digitised theses to their author pages on Wikipedia using authority records, as well as working on a tool which enables the bulk generation of Wikidata records for theses.
Keep an eye out for further updates as we enter the final stages of the project.
Gavin Willshaw (gavin.willshaw@ed.ac.uk)