Son of Dublin Core (SoDC) — Encoding, Validating and Harvesting Graph-based (Meta)data — Version 0.2 (alpha)

by Alistair Miles

Being at the Dublin Core conference last week gave me some ideas, especially about how to bring the DCMI architecture, RDF, OAI-PMH and OAI-ORE closer together. I gave those ideas a name — “Son of Dublin Core (SoDC)” — and I hacked up a proof of concept.

I’ve just done a few fixes and extensions, and released SoDC version 0.2 (alpha). The latest release will always be available from SoDC is just an idea at this stage, so let me know what you think…

There’s two main parts to SoDC…

First, SoDC-XML is a concrete XML syntax for graph-based (meta)data. It is designed specifically to allow graph-based (meta)data to be embedded in harvesting protocols like OAI-PMH. It is also designed to enable application-specific syntax validation using commodity XML validation tools, enforcing the constraints normally expressed in an “application profile”.

The main motivation for SoDC-XML is quality control. Metadata providers and consumers are moving towards a graph-based approach to their metadata, because it allows metadata from one source to be cleanly integrated with metadata from other sources. But providers and consumers still need to set expectations, and to know when expectations are not being met. Constraining and validating graphs at the syntax level is the simplest way of achieving that, yet their is no standard way to do that using RDF toolkits. SoDC-XML fills a gap, by providing a concrete XML syntax for graph-based metadata, constrained by an XML schema for generic syntax validation, on top of which application-specific syntax constraints can easily be implemented.

The second part is SoDC-CL — a language for expressing application-specific constraints over metadata graphs. It allows basic syntactic constraints to be expressed, such as typically form the basis for a metadata “application profile”.

There are also two supporting utilities…

The SchemaGen utility transforms an SoDC-CL document into a Schematron schema, which can then be used to validate SoDC-XML documents — providing a pipeline for automating the implementation and enforcement of syntax constraints over metadata graphs.

The TurtleGen utility transforms an SoDC-XML document into a Turtle document — SoDC-XML is a concrete RDF syntax, so the transformation is straightforward.

There’s still lots TODO, and a few bugs, but hopefully it prooves the basic idea. See the SoDC main page for more info.