SKOS Use Cases – Why Focus on the Application?

by Alistair Miles

In recent discussions of how to go about gathering requirements for SKOS, and how to structure SKOS use cases, I have placed a lot of emphasis on the *application* of controlled vocabularies. In other words, what are the vocabularies being used for? What is their primary function?

Use cases for SKOS will naturally be concerned with one or more thesauri/classification schemes/taxonomies/[other], and so the question has been raised, why be concerned about the application of the vocabularies? Why not just consider the vocabularies themselves?

As I see it, the central goal of the requirements gathering process is to define clear, unambiguous and testable *criteria* that establish the *sufficiency* of SKOS. I.e. we need to be able to know when SKOS is “good enough” – when it fulfils its purpose. In project management speak, this is usually called “quality criteria”.

The question of the primary application or function of a vocabulary begs a series of further questions. Assuming a distributed software architecture, which software components are typically involved in implementing these applications? What data do these components require in order to fulfil their function (i.e. work properly)? How do these components interact? Which components act as the producers of data, which act as the consumers and which act potentially as both?

My suggestion is that, if we identify one or more *applications* which SKOS should enable, and we know (1) which generic software components are required to implement those applications and (2) what data the consuming components require to implement their functionality, we can arrive at some clear, unambiguous and testable quality criteria. I.e. when SKOS is capable of representing the data required, in a way that satisfies the computational demands of the application, it is good enough. Or, to look at it from the other direction, when it is *possible* and *practical* to implement the desired functionality using SKOS for the communication of data, then SKOS is good enough.

My concern is that, without an awareness of the applications SKOS is intended to enable, we will not be able to establish testable quality criteria. And without that, we have no way of objectively choosing between a number of design alternatives. Design then becomes very much a question of personal taste, in which case it can be very hard (sometimes impossible) to arrive at consensus. SKOS also then becomes vulnerable to “feature creep”, where features are continuously added in order to represent all aspects of all vocabularies.

I am also very keen to ensure that SKOS lives up to its name, in particular the “Simple” part. My experience of talking to many people over the last couple of years has been that the relative simplicity and approachability of SKOS has been perhaps its main selling point. I would like SKOS to continue to be as simple as possible, and I believe that the way to achieve this is to know which *applications* are considered most important by the community. SKOS may then be designed to support those applications, in the simplest possible way.

Another key issue is “interoperability” between different types of vocabulary, especially between thesauri, classification schemes, taxonomies and subject heading systems. By “interoperability” I mean the current trend towards software components that can handle more than a single, highly specific, vocabulary type. The reason why software components might want to handle more than one vocabulary type is because, fundamentally, these vocabulary types are intended to serve the same basic purpose, which is generally something to do with retrieval and the organisation and management of information. By focusing on the application of the vocabulary, we may be able to establish what these different vocabulary types have in common, and how the essential features may be represented within the same framework.

A final reason for focusing on the application is money. The economics of developing and applying controlled vocabularies is, ultimately, what is driving current trends. We are at a point in time where organisations are beginning to invest seriously in “Semantic Technologies”. As they do, many are discovering that significant initial and ongoing costs are involved. Also, the learning curve is steep. It is my impression that an awareness of the sizable cost and intellectual challenge is becoming much more widespread (although many in isolated communities have been well aware of it for a long time). Potential benefit is demonstrable, but costs must be minimised before solutions based on semantic technologies become genuinely viable.

It is this drive to enable functionality with demonstrable benefit, at the lowest possible cost, that must underpin the design of SKOS. By being aware of the application context, we may understand how this can be done, and where trends are leading. I anticipate that this will become particularly relevant when it comes to a discussion of issues such as “mappings” between vocabularies, and of managing change within vocabularies (and of managing dependencies between metadata and changing vocabularies). We must work towards a clear understanding of what is needed to ensure that solutions based on SKOS can be economically viable, and practically feasible, propositions – especially for public organisations and for SMEs where money is tight.

[This is also an email on the SKOS public discussion list.]