Bibliographic Data Survey

Introduction

The aim of the second phase of the content work package set out to determine the potential scale of a shared LMS in terms of bibliographic records, patron/user records and, as an aside, what other, currently uncatalogued content might be part of the shared set-up. We also sought to establish what the preference was in terms of a shared LMS – a single bibliographic database with multiple holdings or a shared infrastructure but completely separate datasets.

We must also take account of the potential overlap between collections. Without very detailed collection review it is impossible to say what level of overlap exists between collections, but we do have some evidence for the overlap between the major collections of the ancient universities.

Figures quoted, almost inevitably, carry a margin of error and are intended to be indicative rather than absolute.

Methodology

With the aim of keeping the data gathering exercise a straightforward one – to give an overview and to try and ensure a good response rate, a limited number of basic questions were asked of each of the HEIs in the Scottish sector.

  • How many bib records do you have? (Separated by serials, monographs and ebooks).
  • How many item records do you have?
  • What machine readable readable records exist outside of the traditional LMS.
  • How many patron records do you hold?
  • A guess at the number of uncatalogued items held.

Some peripheral issues around the format and the LMS used were also clarified, but the primary aim was in determining scale, both in terms of information about collections and in finding out about the number of users. The survey was issued by email to all HEIs within the SCURL group and the TBOS project is grateful for the time spent on all the questions by colleagues across the sector.

A total of 11 submissions were received, from which we are able to determine the approximate scale of the LMS that we are considering. It is important to note that the submissions included the NLS.

Bibliographic control of the Scottish collection

The heart of the LMS is the catalogue, the record of the holdings of the library or libraries using the system and the point at which the users establish what content is available, in what format, and how it can be located. This remains the major component of the LMS.

The key questions show that, in raw terms, we are talking about a cumulated database of almost  11 million bibliographic records, though almost 5 million of those records are held by the NLS, and well over 11.7 million volumes.

In terms of spread across formats, the records are overwhelmingly biased towards print monographs as illustrated by the following chart showing the proportion of jounals, monographs and ebooks.

Interesting to note that the number of ebooks offered is now in excess of 1 million, even allowing for overlap, it seems an impressive figure.

Overlap

Obviously, what this total does not reflect is the scale of the overlap between the collections. It would be reasonable to assume a significant overlap between collections, with the exception of the holdings of specialist institutions such as the Royal Conservatoire and Glasgow School of Art, where we might assume a greater proportion of unique holdings.

Determining, with any degree of accuracy, the unique titles within the shared collection will require much deeper analysis of the collections than this project allows, but we do have some guidance through the OCLC’s analysis of the Scottish print collection across a limited number of Scottish universities, published in 2011. The report analysed the collections of ‘ancient’ Scottish universities, Aberdeen, Glasgow, Edinburgh and St. Andrews alongside the NLS. The general overlap pattern of that collections suggested that 82% of the collections of those institutions was made up of works held by one institution.

We might expect the overlap to be greater across the smaller, and younger, institutions and the other HEIs would offer a diminishing number of unique holdings to the entire database. Certainly a 20% overlap does not seem an unreasonable assumption – reducing our 10 million estimate to a more realistic 8 million, as a liberal estimate of the size of the Scottish HE collection.

Interestingly, the OCLC report also analysed the collection against the digitised collection of the Hathi Trust, a collection of some 5.5 million titles. At that time, 978,183 titles from the ‘Scottish collection’ were digitised in that collection. The growth of the Hathi Trust’s digitsed collection and the possibility of having other Scottish HE collections compared with that collection might suggest that that figure would be higher today and represents a significant collection of material that could be made available digitally to the community.

The overlap issue will also be particularly pertinent to our burgeoning ebook collections more generally, where significant collections of titles have been licenced through SHEDL and are therefore common to all libraries. The Springer collection, for example, is of the order of 30,000 and CUP a further 5000. The collection of unique titles is therefore much smaller.

Findabook

One of the potential benefits of a shared LMS would be the ease with which users would be able to locate material across the sector with a single search.

This has been tackled before, the CAIRNS project, an output  of the Centre for Digital Library Research used z39.50 to develop a single search across all participating institutions. Issues relating to matching searches with the relevant fields across different institutions provided entertainment through many meetings of cataloguers across the sector. The inclusion of public library collections further enhanced this service. The service worked but, in terms of usability, has failed to impact on the services of most university libraries, limitations of the technology and the differences in the data make for slow delivery of results that may be unreliable. For a generation of users used to using Google and social media, this was unlikely to succeed.

As part of this project, we visited SLIC and discussed the plans for a new service, building on the CAIRNS concept, though improving on it and looking at lower cost options. The result, Findabook, is set to launch early in 2013, offering a much quicker search function. The service has been delivered in partnership with PTFS Europe, using the RebusConnect product. This uses screen-scraping technology and, while still with some limitations, offers an improvement on previous attempts. It is also set to cover a range of public library services – building on the Irish Borrow Books initiative, but SLIC is looking for ways to engage with the HE community in the development of this service. There is clear potential in opening up the collections more widely.

Of course, there is a major political question posed by such a service – assuming we are able to offer a facility whereby it is easy and quick to search across all the university collections, how is the user to gain access to the book? Reciprocal access schemes address this issue up to a point, but the expectation of either a) material is delivered electronically or b) that a user may simply request access to a work in any university library, regardless of which one they “belong” to comes into sharper focus with an easier search facility in place.

Patron data

The data gathered on patrons can only provide a snapshot of the number of patron records currently being managed. The issue overlap is more limited in this area as the number of patrons enrolled with more than one HEI will be relatively small, though a significant number of the NLS patrons might well be students and staff of Scottish HEIs.

Therefore, what we can reasonably say that a patron database of over 350,000 people, or approximately 7% of the Scottish population, are managed within LMS systems of Scottish HE.

Retrospective conversion

It may be a term considered old-fashioned in the context of electronic collections and the digital age, but it was felt important to establish the scale of cataloguing effort still required to provide complete bibliographic control of the collections in the sector.

As expected, the NLS and the major universities have the biggest collections of uncatalogued material and it is still the case that there are almost 1.7 million titles lurking in our libraries that do not have any bibliographic control. Whether this issue is likely to be tackled in the future is a matter for debate, but it is perhaps worth noting the amount of material that is not currently accessible to users.

Structure of a shared system

The key question at the end of the survey related to the fundamental make up of the system, in terms of the bibliographic data. We asked whether the library favoured holding distinct databases, albeit with a shared infrastructure, or whether the notion of a single bibliographic record with multiple holdings would be the preferred option.

Given that at this stage the notion of a shared LMS is theoretical, it allows us to consider the relative merits of alternative approaches. The survey proffered two options and invited respondents to state a preference and, briefly, their rationale for doing so. The two options offered were a shared infrastructure, but distinct and separate databases or a complete integration of databases, with a single bibliographic record shared amongst all members. The former model may be represented by the type of shared environment used by Heriot Watt and Abertay with the shared bib model exemplified by the large consortia in the US, such as Ohio Link, where all the holdings are shown under a single bibliographic record.

That said, there are examples of both approaches in smaller scale shared environments in Scotland already. Most recently, the Rowan partnership announced that the catalogues of UHI, UWS and the SAC would be merged as part of the shared system development. That “merger” of catalogues is expected in January of 2013. A useful discussion with Anna Enos on November of 2012 revealed the strategy adopted there and some of the issues that have arisen as a result of the single-bib approach.

  • Cataloguing policy required detailed discussion. Although the partnership has agreed on a single bib, and the preference for UWS records in cases of duplication, the OPAC view is “scoped” to allow users at each institution to limit the view of the catalogue to just their home site.
  • The single bib also applies to ebooks, with the result that ebook records may have two URLs to accommodate separate licencing and authentication at each of the partner libraries, with the possibility of confusion for users as a result.

The question of assimilating library cataloguing policy was raised by a number of respondents and is especially relevant to special collections where locally relevant notes on provenance or binding may apply at the bibliographic level. There is also the question of differing cataloguing standards and the need for the NLS, for example, to have records that comply with the expected standards at other bibliographic services and utilities in which the records are used.

The range of comments received on this are as follows:

A single database with a single bibliographic record for all Scottish HE titles, with separate holdings records for each library.

“In order to benefit from increased resource sharing and reduced duplication of data maintenance, we would at this theoretical stage see greater benefit from a single database with a single bibliographic record for all Scottish HE titles, with separate holdings records for each library.  This is assuming that any local requirements can also be satisfied by this approach.  It may also depend on how e-resources are linked to from the single bibliographic record, for example.”

“This would be the preferred model, but only if it did not prevent local cataloguing practices continuing (local fields to be added etc).  We would also need a method for handling special collections/rare books where detail about local copies is important to record. A single record showing multiple libraries provenance/annotations/missing pages may be problematic.”

“From a user point-of-view this would be the most straightforward way to answer the main reason for using a LMS; Do you have the title I want and where can I get it from?”

“National Library of Scotland understands the benefits of sharing single bibliographic records with other libraries, however, as the Library contributes to the UK legal deposit Shared Cataloguing Program (SCP) it would have concerns regarding changes made to SCP records by others.  SCP requires a certain level and quality of cataloguing.  SCP records contribute to BNB.  To operate in a single bibliographic record environment there would need to be appropriate controls in place so that SCP records could not be altered by other libraries.”

“A single database with a single bibliographic record would be a benefit in reducing the time spent individually creating, or modifying and uploading records. It would also act as a union catalogue for our users, which would be a benefit in situations where reciprocal borrowing arrangements are in place. The individual libraries could still retain the ability to add custom fields/notes as appropriate, and at the holdings and item level circulation would be adminstered for each institution.”

Separate databases on a single shared infrastructure, but all editing and quality control responsibilities would be retained by individual libraries.

“We input quite a lot of specific ‘QMU’ relevant information in to many of our records.”

“This would be our preferred option. It was felt that a single record with multiple holdings could be cluttered/confusing on the screen for users. There is also a concern option 1 wouldn’t offer the flexibility of adding in notes relative to a specific institution.”

“This would be our preference, particularly for Special Collections material where we would have unique information relating to provenance held in the bibliographic record.  Management of e-resources might also be an issue if the database was shared, particularly where a customer-specific URL is used.”

“Given the above, National Library of Scotland may prefer this approach if the above does not offer an appropriate level of control for SCP records.  However, NLS may be willing (if record licensing permits) to contribute its catalogue records to a single bibliographic record environment.”

“This would be a similar arrangement to the consortia we currently are part of, which has distinct advantages, but would not alow for the full integration as described above.”

Thus we seem to be left with considering the relative merits of deduplication, simplicity, resource sharing and potentially improved bibliographic control that might accrue from the LMS being based on a single bibliographic record versus the issues around lack of local control and local information, as well as the considerable issue of separately  authenticated access to e-resources.

Repositories and archives

Library services are responsible for increasing numbers of records held in systems other than the traditional LMS and the possibility of a shared system begs the question about the limits of that system in terms of the resources that it covers.

Our survey also sought to get a general overview of the scale of systems used to manage archives and research repositories. In the case of the latter, records may be held in research management systems or in separate repositories such as DSpace or eprints. These have been conflated to give an overall picture of the number of records being managed.

Of course, in the case of archival material there is no duplication, by definition the material is unique. In terms of research outputs there may be some duplication if works are produced by joint authors at more than one institution and held in the repository/RMS of both sites.

The following table summarises the scale of the datasets in these areas.  10 libraries provided data on these records, though in some cases the data supplied is an estimate of the total.

Lib

Repository

Archives

1

12819

20000

2

6000

160000

3

14280

4

2000

5

635

5922

6

59997

261670

7

13700

10000

8

27000

80023

9

7200

2400

10

3685

Total

147,316

540,015

Conclusion

The potential scale of a shared LMS in Scotland is considerable, but there are much larger databases and LMS vendors are able to scale systems to cope with record numbers at this level and beyond.

The survey and the discussions with interested parties and other relevant projects suggest that there are a relatively small number of key questions to be addressed in considering the content of a shared LMS.

  1. What service do we want to provide our users? Do we want  a system that offers a single point of access to all material held in Scotland HE collections, as if they were a single collection?
  2. Depending on the prevailing response to that question, the supporting access and reciprocal borrowing arrangements may need to be reviewed.
  3. If a shared system is felt to offer benefits in terms of shared resources, shared expertise and improved bibliographic control, then a project taking this forward must consider the cataloguing policies across member libraries and take account of any significant obligations that libraries may have, addressing practice is relation to, among other things, special collections and electronic books.

These questions may alter in importance as the shared electronic collection grows and the material to which all Scottish HEIs have access increases in scope and depth. We do, however, have significant, and important  print collections to manage and exploit for the foreseeable future.

Leave a Reply

Your email address will not be published. Required fields are marked *