Home University of Edinburgh Library Essentials
June 12, 2026
We have a collection of historic papers from the Scottish Court of Session. These are collected into cases and bound together in large volumes, with no catalogue or item data other than a shelfmark. If you wish to find a particular case within the collection, you are restricted to a manual, physical search of likely volumes (if you’re lucky you might get an index at the start!).

Volumes of Session Papers in the Signet Library, Edinburgh
I am hoping to use computer vision techniques, OCR, and intelligent text analysis to automatically extract and parse case-level data in order to create an indexed, searchable digital resource for these items. The Digital Imaging Unit have digitised a small selection of the papers, which we will use as a pilot to assess the viability of the above aim.
I am indebted to Dan Vanderkam‘s work in this area, especially his blog post ‘Finding blocks of text in an image using Python, OpenCV and numpy’ upon which this work is largely based.
The items in the Scottish Session Papers collection differ from the images that Dan was processing, being images of older works, which were printed with a letterpress rather than being typewritten.
The Session Papers images are lacking a delineating border, backing paper, and other features that were used to ease the image processing. In addition, the amount, density and layout of text items is incredibly varied across the corpus, further complicating the task.
The initial task is to find a crop of the image to pass to the OCR engine. We want to give it as much text as possible in as few pixels as possible!
Due to the nature of the images, there is often a small amount of text from the opposite page visible (John’s blog explains why) and so to save some hassle later, we’re going to start by cropping 50px from each horizontal side of the image, hopefully eliminating these bits of page overspill.

A cropped version of the page
Now that we have the base image to work on, I’ve started with the simple steps of converting it to grayscale, and then applying an inverted binary threshold, turning everything above ~75% gray to white, and everything else to black. The inversion is to ease visual understanding of the process. You can view full size versions by clicking each image.
The ideal outcome is that we eliminate smudges and speckles, leaving only the clear printed letters. This entailed some experimenting with the threshold level, as you can see in the image above, a lot of speckling remains. Dropping the threshold to only leave pixels above ~60% gray was a large improvement, and to ~45% even more so:
At a threshold of 45%, some of the letters are also beginning to fade, but this should not be an issue, as we have successfully eliminated almost all the noise, which was the aim here.
We’re still left with a large block at the top, which was the black backing behind the edge of the original image. To eliminate this, I experimented with several approaches:
I settled on the Canny/Rank filter approach to produce these results:
Next up, we want to find a set of masks that covers the remaining white pixels on the page. This is achieved by repeatedly dilating the image, until only a few connected components remain:
You can see here that the “faded” letters from the thresholding above have enough presence to be captured by the dilation process. These white blocks now give us a pretty good record of where the text is on the page, so we now move onto cropping the image.
Dan’s blog has a good explanation of solving the Subset Sum problem for a dilated image, so I will apply his technique (start with the largest white block, and add more if they improve the amount of white pixels at a favourable increase in total area size, with some tweaking to the exact ratio):
So finally, we apply this crop to the original image:

Final cropped version
As you can see, we’ve now managed to accurately crop out the text from the image, helping to significantly reduce the work of the OCR engine.
My final modified version of Dan’s code can be found here: https://github.com/mbennett-uoe/sp-experiments/blob/master/sp_crop.py
In my next blog post, I’ll start to look at some OCR approaches and also go through some of the outliers and problem images and how I will look to tackle this.
Comments and questions are more than welcome đ
Mike Bennett – Digital Scholarship Developer

Web of Science is undergoing scheduled maintenance on Sunday June 25, from 13.00 (BST) to Monday June 26, at 01.00 (BST).
During this time, access to the service may be intermittent and the service should be considered at risk during the maintenance period.
Clarivate Analytics apologise for any inconvenience as a result.
During the maintenance period there will also be an upgrade of Web of Science to version 5.25.
For information on the new features of the WoS upgrade (including screenshots) please see the WoS Release notes at:
http://wok.mimas.ac.uk/support/documentation/WoS525-external-release-notes.pdf
Features include:
* Modernised discovery workfows.
* Enriched analytics workfows.
* Expanded and updated data.
* Improved Product Quality.
In this week’s blog, Claire, our first placement student of the summer describes her experience of working with us…
My name is Claire and I am a conservation student from Northumbria University, specialising in paper conservation. This is the first of several voluntary work placements I must carry out as part of my Masterâs degree. Working at the conservation studio the past two weeks has challenged me to work with new materials outside of my specialism.

Claire working in the studio
I have been working on artworks that will be displayed as part of the Universityâs new exhibition, âHighlands to Hindustanâ which goes on display at the end of July. My role was to conserve the works where needed, but to also improve the storage and display of the objects. This meant a great deal of multi-tasking and time management. Read More
There is a large area of grass growing at the side of the University Collections Facility (née Library Annexe) which is usually just grass and moss with the odd daisy daring to poke its head above the grassy parapets. However, a lunchtime stroll resulted in a delightful find of a Common Spotted Orchid (Dactylorhiza fuchsii) growing amongst the buttercups (Ranunculus acris) and grasses at South Gyle. 
Brown honey bees looked conspicuous busily collecting nectar from the white clover (Trifolium repens) which is in full flower. The tiny purple flowers of selfheal (Prunella vulgaris) looked especially lovely as a complementary colour to the shining yellow buttercups. A daisy (Bellis perennis) or two is also flowering. A few thistles (Cirsium vulgare) are nearly flowering and small sheep sorrel plants (Rumex acetosella) are appearing. The type of grassland here (neutral to alkaline) is typical of one which has not had chemicals or artificial fertilisers put on it and is typical of how grassland looked before ploughing and fertilising became common practices.
Sandi Phillips, Collections Management Assistant

We have purchased the 2017 Hart Collection from Bloomsbury. We have already added 61 titles to DiscoverEd and the remaining titles will be added as they are published. A full list of the 125 titles are listed here and the status column K will show the available titles already added to DiscoverEd.
The National Institute of Japanese Literature (Kokubunken) has made their newæ°æ„æŹć€ć
žç±ç·ćăăŒăżăăŒăč/Database of Pre-Modern Japanese Works (tentative edition) freely available at: https://kotenseki.nijl.ac.jp/ă(Japanese interface) and https://kotenseki.nijl.ac.jp/?ln=en (English interface).
The database, built out of Kokusho SĆmokuroku, a Japanese reference book published by Iwanami Shoten, and the largest of its kind, contains about 300,000 of 500,000 entries listed in the original book, along with digitized versions of materials referred to by the entries. It also allows one action search across repositories of multiple institutions.
For more information, please see the âProject to Build an International Collaborative Research Network for Pre-modern Japanese Texts (NIJL-NW project)âăwebsites at:
Did you know we had International Inventions Exhibitions in Edinburgh in 1886 and 1890? The most famous international exhibition was held in London in 1851 at the Crystal Palace, but there were many others in the major European Capitals. The first one in Edinburgh was held on the Meadows, about where Jawbone Walk is now. The second was out west at Shandon, between the colonies and the canal. Music was important in both, with brass bands, organ recitals and piping contests. In 1890, you could stand in a phone booth in Edinburgh and listed to orchestras playing in Galashiels. Phones were exciting new technologies then, and their modern equivalents still are!

The First World War was the first in which air warfare played a significant part. While aircraft were ultimately to change the face of warfare, the demands of the war provided a rapid boost to this very new technology.
At the outbreak of the war effective powered flight was a technology not much more than ten years old, but each of the participating countries already had an armed air service of some sort. The allies had 208 aeroplanes between them, and Germany 180. There were also airships, which initially seemed better-tried and more practical, although their importance diminished as the war progressed. The British aircraft were split between the Royal Flying Corps (RFC), under the command of the army, and the Royal Naval Air Service (RNAS), under the command of the Navy. These were merged to form the Royal Air Force (RAF) in 1918.
The types of aircraft in use were very varied, because each country had several of their own manufacturers and models of aircraft, and almost any machine which was available might be pressed into service. Aircraft models evolved rapidly, as technology was improved, and the needs of the war changed.

Initially, aeroplanes were regarded as primarily useful for reconaissance, and stable two-seaters, such as the British B.E.2, were preferred. These were not very manoeuvreable and were poor at defending themselves or evading enemy anti-aircraft guns, which led to the development of fast, single-seater fighters, such as the French S.P.A.D.

The Germans made parallel developments; their early reconnaisance aircraft including monoplanes with distinctive swept-back, birdlike wings, such as the Rumpler Taube. Â By the end of the war bigger, heavier aircraft designed for bombing had been developed.
From the beginning of the war both allied and German air forces had to establish, for the first time, how to make their aircraft recognisable, both to other airmen and to those on the ground. National markings were rapidly adopted, and by the beginning of 1915 both French and British authorities had produced posters showing silhouettes of enemy aircraft, entitled âFire on theseâ.
In late 1914 or early 1915 the French produced the very first book of aircraft silhouettes for recognition purposes, Silhouettes DâAvions, Diagrams of Aeroplanes. These were produced with text in French and English and distributed to both troops and airmen. An alternative version was produced on cards, for better durability. Updated editions and supplements were issued to reflect new developments. We hold two versions of this in our collections â the very first edition, showing French, British and German aircraft of late 1914, and a supplement of French aircraft from September 1915
.
Examination of the two shows just why these publications were of limited success in preventing both ground troops and airmen from attacking the wrong aircraft. The pictures are not very high quality, and do not show up the main features of the individual aircraft particularly clearly, especially to an untrained eye.  Take the picture of the B.E.2 â it is quite difficult to make out that it is trying to indicate that the upper wings are longer than the lower ones.
The illustrations for the two editions are printed from different artwork, which shows up ambiguities in the lines. The Parasol Morane is included in both. In one illustration there are lines which might be either substantial structural struts, or nearly invisible cable, but all of which are omitted from the other picture entirely.

The S.P.A.D. was a very common aeroplane type, but it takes some effort to work out from its picture that the propeller was located in the middle of its fuselage.
There is even one illustration we have not been able to identify, the Avion de Chasse Morane, which does not seem to entirely correspond to any aircraft made by the Morane company and which was used during the war, that we can find a modern record of!

These manuals were all produced in a hurry; they illustrate only a selection of models, and do not show variants or all the latest developments of equipment. Even more confusing, not to say downright unhelpful, is the earlier edition, which summarises all the other British models of aircraft as being similar to the ones illustrated.
Towards the end of the First World War the problems of distinguishing aircraft, while not solved, were somewhat reduced. In 1917 the RFC ordered 1000 Bristol Fighter aircraft, so that although there were still many different models of aeroplane in the skies, there began to be some standardisation. More successfully, in early 1918 the French set up a âFlying circusâ which toured examples of the different models of their aircraft around the British airfields. This had the happy result of not only familiarising the RFC with the appearance of the different models, but the opportunity to compare performance and develop some camerarderie with the French aviators.
Despite the shortcomings of these identification manuals, this approach continues to be used today, better pictures and combining it with other methods of teaching improving its effectiveness, although today it is more likely to be used by enthusiasts for civilian aircraft, and combined with a mobile phone app for detecting and tracking aircraft.
Whilst systematically scanning early 1900âs theses, mostly on specific medical matters such as Insanity and Beri Beri, I came across a fascinating account of Guyana rural life in what was then known British Guiana, written in 1922 by a Mr. John F.C. Haslam (link to follow). Mr Haslam, who by his own admission, after being appointed to the Government Public Health Department of British Guiana, just 3 months later found himself as the Head of Department for the entire country.
Mr Haslamâs detailed account lays out this countryâs unique geography and attempts to examine the population that inhabit this place and the various issues of housing, employment, sanitation and health that preoccupy him as Head of Public Health in 1922. Written very much from the British colonial viewpoint with a specific agenda of public health and accompanied with intriguing photographs, this theses provides an invaluable narrative and illuminates a particular time period in Guianaâs colonial past.
Mr Haslam notes from the beginning that it soon became apparent to him âthat both the physical conditions of the country, the political constitution and social organisation of the people were peculiar if not uniqueâ.
His first chapter entitled âThe Countryâ sets out specific facts regarding the landscape and makeup of the land. John Haslam is aware of the Britishâs population general ignorance of Guiana, being a far flung county, lying on the top right corner of the continent of South America, with a tropical climate lying 10 degrees of the equator, and  often confused with Demerara which forms a third of the actual size of Guiana. The 2 rivers Demerara and Essequibo, which allowed for trading posts, fertile land, helped in the creation of the sugar plantations and provide testament to the Guianaâs history of  colonisation initially by the Dutch and later the British, who gained control in 1831, right up until 1966. Guyana is still unique as it is the only country where English is the official language though the most commonly spoken language is Guyanese Creole.
Haslam, details the low lying nature of Guiana coastal areas and describes a very complex system of canals, trenches with some scattered attempts of control through damming and sluices such as the Koker, left over from the Dutch colonial times. Haslamâs vibrant description of the rainy season gives an impression of the difficulties face
âthe coast lands form a vast swamp in which cows may be seen up to their necks in water, grazing on water lilies, where lambs and pigs swim almost from birth and there are many homes accessible only by boat. At such times domestic animals, alligators and a boa constructor have been seen together on the public high road â the only dry place.â

Most of the townships and villages lie along the cultivated belt running just inland along the Atlantic coast and so these populations are faced with the challenges of the rainy season and constant flooding.  In his descriptions of the inland jungle which to Haslam appears impenetrable you get a sense of the fear of the unknown, akin to Day of the TriffidsâŠ
âIn truth there is something sinister about the rank and fleshy vegetation which in a few months will cover a neglected house or obliterate a clearingâ
Written from his perspective as public health governor of Guiana under direct British colonial rule, the chapter titled simply âThe Peopleâ provides a gripping account of the different ethnic groups as itemised by the Census and included is a table of population figures starting with âThe Europeans, other than Portugueseâ, then âPortuguese, East Indian, Chinese, Blacks, Mixed, Aboriginal Indiansâ and lastly âNot Statedâ.

Haslam gives an intriguing commentary on these various ethnic groups making up this diverse population, starting with âthe true nativesâ of the country, the aboriginal Indians he possibly views most favourably who are âon the whole shy and retiringâ and âthey would be pleasant to work among and teachableâ.  He states ‘ The Blacks of the Colony are of course quite as much foreigners as the whites’  while recognising the legacy of slavery and his interpretation of its effects.
‘After the abolition of slavery the negroes’ dislike of steady employment was very apparent. It is a racial characteristic that a man prefers working for himself for a pittance to earning good wages from an employer’
Diamond mining proved an attraction for many of these men in that it had the allure of possible wealth and independence from a traditional employer.  Haslam finds ‘ there is a happy-go-lucky carelessness and a laughing indifference about the black people which make work among them pleasant if sometimes tantalising. However his frustration from a public health view point leaks through his discourse ‘the most discouraging factor to a sanitarian working among the negroes is that while they readily assume a veneer of civilisation – smart clothes, church going and politics – they have little instinct of tidiness or cleanliness, and a filthy mass of garbage under the kitchen window gives no qualms whatever to the housewife’. This chapter is filled with such observations  sometimes  prejudiced and  often voiced in a tone that sounds decidedly out of place today, such as âlove of children and family life and respect for age and education are factors which will maintain the East Indian people as a most important section of this colonyâ however âthe Portuguese are of more doubtful value to the countryâ.
Haslam obtains his figures from the  last available Census in 1921 yet wrestles with the indeterminacy of the Census figures with its many interesting anomalies, for instance with the number of husbands and wives in 1911 âthe former outnumbering the latter by 2,847â. The recording of age alone was a perpetual problem for the Census Commissioner with many relying on collective memories of definitive events such as âthe cholera yearâ or âthe fire in Charlestownâ to provide a rough gauge of time.
He alludes to the missing figures of the ghostly aboriginal population who for colonial administrative purposes seem to constantly elude proper documentation. His lack of encounter with many aboriginal people is interesting as he states âonly a few have been drawn into the modern life of the colony, most âclinging to their tribal customs and primitive mode of life and withdrawing into their unexplored forests before the advance of civilisationâ.
You get a real sense from reading this of the ââhuge sparsely occupied hinterlandâ that forms most of Guiana, occupied by these unknown tribes and how it informs a large part of Haslamâs collective unconscious in his quest of âpioneer sanitary workâ. Other details which grabbed my attention was his discussion of the âsupernatural beliefsâ that still existed with the attentions of the âobeah manâ , a popular derivative of voodoo, that used a white “fowl cock” and even on occasion child sacrifice to fight off evil spirits and sickness.
Haslam goes on to study the occupations, housing, sanitiation,  and health of various population types, providing compelling photoographs to illustrate his points


Although his observations belong to a very different era with a different world view they merit attention through his detailed recording of sanitation, housing, education and health and general living conditions, amongst all the different groups that found themselves living in this unique part of the world under colonialisation and still living the legacy of slavery. It is Haslamâs rich commentary often falling into casual asides and sarcasm while still maintaining a profound engagement that makes it so inviting to partake of this thesis and become immersed for a short time in this little known country from one manâs unique position.
In this week’s blog, Special Collections Conservator Emily, describes the highly successful crowdsourcing conservation event held in February at the CRC…
In February, we held our first ever conservation crowdsourcing event here at the CRC. Over a two-day period, with the help of 24 participants, we aimed to rehouse section II of the Laing manuscripts in acid-free folders and boxes. Laingâs collection of charters and other papers is the Universityâs most important manuscript collection. Highlights of the collections include letters by Kings and Queens of Scotland and England, poems in the hand of Robert Burns and early manuscripts in Gaelic and Middle Scots. You can find out more about the collection here. The collection was in poor condition due to its housing in unsuitable upright boxes and folders. It was difficult to access and there was a risk of further damage every time it is handled.

Laing II boxes on the shelf, before treatment
Hill and Adamson Collection: an insight into Edinburghâs past
My name is Phoebe Kirkland, I am an MSc East Asian Studies student, and for...
Cataloguing the private papers of Archibald Hunter Campbell: A Journey Through Correspondence
My name is Pauline Vincent, I am a student in my last year of a...
Cataloguing the private papers of Archibald Hunter Campbell: A Journey Through Correspondence
My name is Pauline Vincent, I am a student in my last year of a...
Archival Provenance Research Project: Lishan’s Experience
Presentation My name is Lishan Zou, I am a fourth year History and Politics student....