Monthly Archives: July 2018

Marking up collections sites with Schema.org – Blog 2

This blog was written by Nandini Tyagi, who was working with Holly Coulson on this project.

This blog follows the blog post written by Holly Coulson, my fellow intern at the Library Metadata internship with the Digital Development team at Argyle House. For the benefit of those who are directly reading the second blog, I’ll quickly recap what the project was about. The Library Digital Development team have built a number of data‐driven websites to surface the collections content over the past four years; the sites cover a range of disciplines, from Art to Collections Level Descriptions, Musical Instruments to Online Exhibitions. While they work with metadata standards (Spectrum and Dublin Core), and accepted retrieval frameworks (SOLR indices), they are not particularly rich in a semantic sense, and they do not benefit from advances in browser ‘awareness’; specifically, they do not use linked data. The goal of the project was to embed schema.org metadata within the online records of items within Library and University Collections to bring an aspect of linked data into the site’s functionality, and resultantly, increase the sites’ discoverability through generic search engines (Google etc.) and, more locally, the University’s own website search (Funnelback).

In this blog, I’ll share my experience from the implementation perspective i.e. how were the documented mappings translated into action and what are the key findings from this project. Before I dive into implementation of Schema, it is important to know what Schema is and how does it benefit us. Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model. I’ll explain the need for Schema better through an example. Most webmasters are familiar with HTML tags on their pages. Usually, HTML tags tell the browser how to display the information included in the tag. For example, <h1>Titanic</h1> tells the browser to display the text string “Titanic” in a heading 1 format. However, the HTML tag doesn’t give any information about what that text string means—” Titanic” could refer to the hugely successful movie, or it could refer to the name of a ship—and this can make it more difficult for search engines to intelligently display relevant content to a user. In the collections for example, when the search engine would be crawling the pages, they will not understand if a certain title refers to a painting, sculpture or an instrument. Using Schema, we can help the search engine make sense of the information on the webpage and it can list our website at a higher rank for the relevant queries.

Before: No sense of linked data. Search engines would read the title and never understand that these pages refer to items that are a part of creative work collections.

Anatomical Figure of a Horse (ecorche), part of Torrie Collection
Anatomical Figure of a Horse (ecorche), part of Torrie Collection (before)

Keeping this in mind we started mapping the schema specification in the record files of the website. There were certain fields such as ‘Inscriptions’ and ‘Provenance’ that could not be mapped because Schema does not support a specification for them yet. We are documenting all such findings and plan to make suggestions to Schema.org regarding the same.

Implementation of mapping in the configuration file of Art collections
Implementation of mapping in the configuration file of Art collections

This was followed by implementing changes directly in the record files of the collection. There were a lot of challenges such as marking up images, videos and audios.  Especially from images point of view, some websites used the IIIF format and others used the bitstream which required making wise decisions in how to mark up such websites. With a lot of help and guidance from Scott we were able to resolve these issues and the end result of the efforts was absolutely rewarding.

After: Using the CreativeWorks class of Schema the websites have been marked up. Now, search engines can see that these items refer to creative work category such as art, sculpture, instrument etc. They are rich in other details such as name of creator, name of collection, description etc. These huge changes are bound to increase the discoverability of collections website.

Anatomical Figure of a Horse (ecorche), part of Torrie Collection
Anatomical Figure of a Horse (ecorche), part of Torrie Collection (after)

I was very fortunate that Google Analytics and Search Engine Optimisation (SEO) trainings were held at the Argyle House during my internship period and I was able to get insights from these trainings. They illuminated a whole new direction and lend a different viewpoint. The SEO workshops in particular gave ideas about optimizing the overall content of the websites and making St.Cecilias more discoverable. I realized the benefits of SEO are seen in the form of increased traffic, Return on Investment (ROI), cost effectiveness, increased site usability and brand awareness. These tools combined with schema can add significant value to the library services everywhere. It is a feeling of immense pride that our university is among the very few universities that have employed schema for their collections. We are confident that schema will make the collections more accessible not only to students but also to the people worldwide who want to discover the jewels held in these collections. The 13 collections website that we worked on during our internship are nearing completion from implementation perspective and will be live soon with Schema markup.

To conclude, my internship has been an amazing experience, both in terms of meaningful strides in the direction of marking up the collections website and the fun, conducive work-culture. Having never worked on the web development side before, I got the opportunity to understand first-hand the intricacies associated in anticipating the needs of users and delivering a perfect information-rich website experience to the users. As the project culminates in July, I am happy to have learned and contributed much more than I imagined I would in the course of my internship.

Nandini Tyagi (Library Digital Development)

Thanks very much to both Nandini and Holly on their sterling work on this project. We’ve implemented schema in 3 sites already (look at https://collections.ed.ac.uk/art for an example), and we have another 6 in github ready to be released. The interns covered a wealth of data, and we think we’re in a position to advise 1) LTW that this material can now be better used in the University website’s search and 2) prospective developers on how to apply this concept to their sites.

Scott Renton (Library Digital Development)