Home University of Edinburgh Library Essentials
December 16, 2025
When I was studying for my Masters dissertation, I kept finding dried flowers in between the pages of books I was borrowing from the library. Now, we all know that drying flowers in between books, especially library books, is a bad idea. If the flower is particularly big, the books will struggle to close properly and the colours of the petals, through the release of moisture, will transfer on the page. But despite all of it, I was happy to find daisies and freesias while revising, and I kept most of them. Now, I am still happy when I find objects in collection items, some of which have not been opened for almost 100 years. I have realised that all sort of things find their way in theses. I removed cigarettes buds found in 1910s book on bronchitis, read letters and I have seen that photographs, cigar vouchers and 1970s train tickets all seem to have been used as bookmarks or place orders and never have been removed from the pages of theses. Sometimes the objects and documents found in these theses seem to be related to the creation of the volumes themselves. For example, we found receipts and quotes from 20th-century Scottish bookbinders, library notes and interlibrary loans request slips. But sometimes what we have found is more original and not necessarily related to the content or the creation of the physical item.
Here are three examples of objects that found their way in between the pages of PhD theses.
The ‘Anti-German Union’ pamphlet
Found in a 1916 medical thesis titled The treatment of tuberculosis this pamphlet relates to racist propaganda rather than medical knowledge. The leaflet promotes ethnocentric ideas of ‘Britishness’ presenting German workers, economic trade with Germany as a threat.
“The Anti-German Union has been formed to unite British-born men and women, without respect to party, class or creed, with the following aims and objects:
The pamphlet also includes a membership/registration form which was left blank. It was produced at the same time as the thesis was (around 1916). The AGU Later renamed ‘British Empire Union’ was an organisation instigating anti-German sentiment and was part of a bigger movement that grew after WW1 and the developing of Germany as an international power. The union promoted the expulsion of German immigrants and the obstructing of German trade.
It is not clear whether the examiner or the author itself accidentally left the pamphlet in the thesis, but at least we know that that the registration form was left blank. I was particularly fascinated by this object because it has no relation to the content of the thesis and there is a limited amount of similar documents on the same subject in digital image repositories. It is also a statement to a very specific time in history; the document could have only been written in 1915 or 1916, and it testifies the change in aims and perspective of an organisation.
Five Embassy Cigarettes Vouchers Objects found in theses, not only provide evidence for the political atmosphere that alumni were immersed in, but they also show a change in consumerist culture and advertising. One of my favourite discovery is ‘Five Embassy Vouchers’.
Not so common these days, cigarettes vouchers were given to smokers as a reward for their loyalty. This was a win/win situation: consumers could trade this vouchers in a store for their favourite cigarettes while companies found a way of retaining their customers.
Embassy is a cigarette brand first sold in 1914 by Imperial Tobacco. Originally branded ‘Strand’ it gained popularity in the 1960s as a coupon brand.
Gill, 1966
The most common objects found loose in between the pages of theses are, after library slips, photographs. Some of these pictures are labelled, usually portraying graduating students; others remain a mystery.
One that we could find more information about is ‘Gill, 1966’. This picture fell from a 1970s duplicate copy of a thesis. We are unclear which thesis she originally came from as she fell when we picked up a couple of theses from our delivery. The volume it came in (as it is a duplicate) has been certainly destroyed now, but the picture remains a unique object, one portraying Gill in 1966, a little girl we know nothing about.
I’m pleased to let you know that the Library has purchased access to the House of Lords Parliamentary Papers (1800-1910) from ProQuest. This resource provides online access to previously unseen and valuable historical documents and is the very first digitised collection of 19th century House of Lords Parliamentary Papers.
You can access the House of Lords Parliamentary Papers (1800-1910) via the Databases A-Z list.
The House of Lords Parliamentary Papers (1800-1910) is an essential research resource that, along with the existing House of Commons Parliamentary Papers database (which the Library already has access to), provides a complete picture of the working and influence of the UK Parliament during the pivotal 19th century. Read More
A further 253 e-books across most subject areas have been added to DiscoverEd. Cambridge Core hosts books from a variety of publishers including Edinburgh University Press, Cambridge University Press, Boydell & Brewer. A list of the new titles and subject areas can be found in the spreadsheet here.

We have a new e-journal subscription – International Journal of Sport Management and Marketing
International Journal of Sport Management and Marketing (IJSMM) is a multidisciplinary journal which aims to provide a unique focus on a wide range of sport management and sport technology topics. It covers advances in theory, new concepts, methods and applications and case studies. Each issue disseminates quality sport-related research relevant to sport technology and sport management, examining both hard and soft perspectives in managing sporting organisations in the public and private sectors.
We have a collection of historic papers from the Scottish Court of Session. These are collected into cases and bound together in large volumes, with no catalogue or item data other than a shelfmark. If you wish to find a particular case within the collection, you are restricted to a manual, physical search of likely volumes (if you’re lucky you might get an index at the start!).

Volumes of Session Papers in the Signet Library, Edinburgh
I am hoping to use computer vision techniques, OCR, and intelligent text analysis to automatically extract and parse case-level data in order to create an indexed, searchable digital resource for these items. The Digital Imaging Unit have digitised a small selection of the papers, which we will use as a pilot to assess the viability of the above aim.
I am indebted to Dan Vanderkam‘s work in this area, especially his blog post ‘Finding blocks of text in an image using Python, OpenCV and numpy’ upon which this work is largely based.
The items in the Scottish Session Papers collection differ from the images that Dan was processing, being images of older works, which were printed with a letterpress rather than being typewritten.
The Session Papers images are lacking a delineating border, backing paper, and other features that were used to ease the image processing. In addition, the amount, density and layout of text items is incredibly varied across the corpus, further complicating the task.
The initial task is to find a crop of the image to pass to the OCR engine. We want to give it as much text as possible in as few pixels as possible!
Due to the nature of the images, there is often a small amount of text from the opposite page visible (John’s blog explains why) and so to save some hassle later, we’re going to start by cropping 50px from each horizontal side of the image, hopefully eliminating these bits of page overspill.

A cropped version of the page
Now that we have the base image to work on, I’ve started with the simple steps of converting it to grayscale, and then applying an inverted binary threshold, turning everything above ~75% gray to white, and everything else to black. The inversion is to ease visual understanding of the process. You can view full size versions by clicking each image.
The ideal outcome is that we eliminate smudges and speckles, leaving only the clear printed letters. This entailed some experimenting with the threshold level, as you can see in the image above, a lot of speckling remains. Dropping the threshold to only leave pixels above ~60% gray was a large improvement, and to ~45% even more so:
At a threshold of 45%, some of the letters are also beginning to fade, but this should not be an issue, as we have successfully eliminated almost all the noise, which was the aim here.
We’re still left with a large block at the top, which was the black backing behind the edge of the original image. To eliminate this, I experimented with several approaches:
I settled on the Canny/Rank filter approach to produce these results:
Next up, we want to find a set of masks that covers the remaining white pixels on the page. This is achieved by repeatedly dilating the image, until only a few connected components remain:
You can see here that the “faded” letters from the thresholding above have enough presence to be captured by the dilation process. These white blocks now give us a pretty good record of where the text is on the page, so we now move onto cropping the image.
Dan’s blog has a good explanation of solving the Subset Sum problem for a dilated image, so I will apply his technique (start with the largest white block, and add more if they improve the amount of white pixels at a favourable increase in total area size, with some tweaking to the exact ratio):
So finally, we apply this crop to the original image:

Final cropped version
As you can see, we’ve now managed to accurately crop out the text from the image, helping to significantly reduce the work of the OCR engine.
My final modified version of Dan’s code can be found here: https://github.com/mbennett-uoe/sp-experiments/blob/master/sp_crop.py
In my next blog post, I’ll start to look at some OCR approaches and also go through some of the outliers and problem images and how I will look to tackle this.
Comments and questions are more than welcome 🙂
Mike Bennett – Digital Scholarship Developer

Web of Science is undergoing scheduled maintenance on Sunday June 25, from 13.00 (BST) to Monday June 26, at 01.00 (BST).
During this time, access to the service may be intermittent and the service should be considered at risk during the maintenance period.
Clarivate Analytics apologise for any inconvenience as a result.
During the maintenance period there will also be an upgrade of Web of Science to version 5.25.
For information on the new features of the WoS upgrade (including screenshots) please see the WoS Release notes at:
http://wok.mimas.ac.uk/support/documentation/WoS525-external-release-notes.pdf
Features include:
* Modernised discovery workfows.
* Enriched analytics workfows.
* Expanded and updated data.
* Improved Product Quality.
In this week’s blog, Claire, our first placement student of the summer describes her experience of working with us…
My name is Claire and I am a conservation student from Northumbria University, specialising in paper conservation. This is the first of several voluntary work placements I must carry out as part of my Master’s degree. Working at the conservation studio the past two weeks has challenged me to work with new materials outside of my specialism.

Claire working in the studio
I have been working on artworks that will be displayed as part of the University’s new exhibition, ‘Highlands to Hindustan’ which goes on display at the end of July. My role was to conserve the works where needed, but to also improve the storage and display of the objects. This meant a great deal of multi-tasking and time management. Read More
There is a large area of grass growing at the side of the University Collections Facility (née Library Annexe) which is usually just grass and moss with the odd daisy daring to poke its head above the grassy parapets. However, a lunchtime stroll resulted in a delightful find of a Common Spotted Orchid (Dactylorhiza fuchsii) growing amongst the buttercups (Ranunculus acris) and grasses at South Gyle. 
Brown honey bees looked conspicuous busily collecting nectar from the white clover (Trifolium repens) which is in full flower. The tiny purple flowers of selfheal (Prunella vulgaris) looked especially lovely as a complementary colour to the shining yellow buttercups. A daisy (Bellis perennis) or two is also flowering. A few thistles (Cirsium vulgare) are nearly flowering and small sheep sorrel plants (Rumex acetosella) are appearing. The type of grassland here (neutral to alkaline) is typical of one which has not had chemicals or artificial fertilisers put on it and is typical of how grassland looked before ploughing and fertilising became common practices.
Sandi Phillips, Collections Management Assistant

We have purchased the 2017 Hart Collection from Bloomsbury. We have already added 61 titles to DiscoverEd and the remaining titles will be added as they are published. A full list of the 125 titles are listed here and the status column K will show the available titles already added to DiscoverEd.
The National Institute of Japanese Literature (Kokubunken) has made their new新日本古典籍総合データベース/Database of Pre-Modern Japanese Works (tentative edition) freely available at: https://kotenseki.nijl.ac.jp/ (Japanese interface) and https://kotenseki.nijl.ac.jp/?ln=en (English interface).
The database, built out of Kokusho Sōmokuroku, a Japanese reference book published by Iwanami Shoten, and the largest of its kind, contains about 300,000 of 500,000 entries listed in the original book, along with digitized versions of materials referred to by the entries. It also allows one action search across repositories of multiple institutions.
For more information, please see the “Project to Build an International Collaborative Research Network for Pre-modern Japanese Texts (NIJL-NW project)” websites at:
Hill and Adamson Collection: an insight into Edinburgh’s past
My name is Phoebe Kirkland, I am an MSc East Asian Studies student, and for...
Cataloguing the private papers of Archibald Hunter Campbell: A Journey Through Correspondence
My name is Pauline Vincent, I am a student in my last year of a...
Cataloguing the private papers of Archibald Hunter Campbell: A Journey Through Correspondence
My name is Pauline Vincent, I am a student in my last year of a...
Archival Provenance Research Project: Lishan’s Experience
Presentation My name is Lishan Zou, I am a fourth year History and Politics student....