Alistair Miles

Month: November, 2006

Enterprise Search: Google vs. Exalead vs. Oracle

Some interesting discussion in the session “Latest Developments in Enterprise Search” at Online Information yesterday (day 2, track 2, session 1)…

Francois Bourdoncle presented the Exalead approach; Roger Ford presented Oracle Secure Enterprise Search (SES); Roberto Solimene presented Google OneBox.

All three speakers emphasised being able to search different types of both structured and unstructured information. Surprisingly, none of the speakers talked about how their products achieve high precision (relevancy). The discussion after the talks was perhaps most interesting, highlighting two major issues in enterprise search, relevancy and privacy … here’s my raw notes taken at the time…

The Architecture of the Web, Web Services and the Semantic Web

Looking back on the experience of drafting the Best Practice Recipes for Publishing RDF Vocabularies (we affectionately referred to it as the Cookbook), I felt at the time like we were trying to thread the eye of an extremely small needle. Coming up with a workable solution, that was consistent with the current W3C/IETF specifications and TAG directives, was a bit like trying to find a way through a large and rather complicated maze.

Looking at the Cookbook now, I can’t help feeling that the solution we have is far from optimal. My main cause for concern is, quite simply, the robustness of the solution. Although there are a remarkable and inspiring number of people out there who really do want to do the Right Thing, it’s just so easy to get it “Wrong”. If the integrity of the Semantic Web depends on absolutely everybody getting it “Right”, it will be a fragile system indeed.

A Thesaurus Data Model for British Standard 8723 (Part 2)

Continuing on from my initial exploration of using UML to capture the monolingual thesaurus data model described in BS 8723 part 2 (written up here), below is an alternative UML model attempting to represent the underlying conceptual structure of a monolingual thesaurus. This model is more complicated, so I’ve broken it into separate class diagrams for easier viewing …

A Thesaurus Data Model for British Standard 8723

The working group producing the new BS 8723 standard for thesauri (structured vocabularies) is currently focusing on the issue of standard formats for interchange of thesaurus data. At a recent meeting it was concluded that a (semi-)formal data model for thesaurus data, using some sort of establishing modeling language, would be a good starting point.

Here is my first attempt to use UML to capture the data model expressed informally as prose in BS 8723 part 2 (monolingual thesauri). The UML was generated using StarUML which is free, and I read this tutorial on UML. I’ve tried to be as faithful to BS 8723 part 2 as possible and capture no more than what is expressed therein nor add any interpretation …

Free UML Tools

I’m looking for a tool for building UML models visually. I want something that will export some sort of image format for documentation, some sort of machine readable format, and preferably something that will generate some Java classes from the model. I’d also like something that integrates with the Eclipse framework. Support for the latest version of UML would be good – I guess that’s UML 2.?. Oh, and of course, I want it all for free …

SKOS Use Cases – Why Focus on the Application?

In recent discussions of how to go about gathering requirements for SKOS, and how to structure SKOS use cases, I have placed a lot of emphasis on the *application* of controlled vocabularies. In other words, what are the vocabularies being used for? What is their primary function?

Use cases for SKOS will naturally be concerned with one or more thesauri/classification schemes/taxonomies/[other], and so the question has been raised, why be concerned about the application of the vocabularies? Why not just consider the vocabularies themselves?

As I see it, the central goal of the requirements gathering process is to define clear, unambiguous and testable *criteria* that establish the *sufficiency* of SKOS. I.e. we need to be able to know when SKOS is “good enough” – when it fulfils its purpose. In project management speak, this is usually called “quality criteria”.

Ontogenesis Network Meeting

The first meeting of the Ontogenesis Network was held 30/31 October in Manchester. The theme of the meeting was “the Informal Meets Formal” .. an exploration of issues connecting less formal types of controlled vocabularies and knowledge organisation/elicitation tools with more formal ontologies grounded in e.g. description logics.

I presented “Gardens of Meaning” – a metaphor for the creation and evolution of controlled vocabularies. I’m very concerned with designing work flow models that minimise the overall costs of vocabulary development and maintenance, especially the costs associated with maintaining dependencies between controlled vocabularies and metadata. This metaphor is a first tentative step in that direction, I hope.

The meeting was excellent, many interesting presentations, unfortunately the web content is poor if you want more information. The Ontogenesis Network home page is a bit out of date, but has some basic information about the context of the network itself. The Ontogenesis Network wiki has more up to date info, including a page for the recent network meeting (including a programme of speakers). There is also the Ontogenesis Network blog, although this doesn’t have much content at the moment.

Sir Isaac Newton said …

This quote is attached to the outside of my building at the Rutherford Appleton Laboratory …

A quote from Isaac Newton