Alistair Miles

Category: digital preservation

The OAIS. Information Model Revisited Part 3. Towards Models for Interpretation/Virtualisation Recipes

In this note, I begin to explore the use of the Eriksson-Penker UML extensions for business process modeling, as a tool for modeling the processes or work flows required to successfully interpret or virtualise a digital object.

Previously, in part 1 of this series, I explored the abstract notions of data, information, representation information and interpretation, as defined by the OAIS Information Model. In part 2, I tried to apply these notions to a simple example of a Web page. I found that we need to go beyond the OAIS Information Model if we want to capture and represent the “recipes” that take you from a sequence of bits to something more useful, in the general case where there may be multiple steps or stages required to process, virtualise or render a digital object.

Recipes and Dependencies

Take again the example from part 2 of a simple Web page, encoded as an XHTML 1.0 Transitional document using the UTF-8 character set, and stored as a single sequence of bits.

I’m interested in modeling the “recipe” that tells me how to turn the encoded sequence of bits back into a Web page, because this recipe will define the “dependencies” for the preserved object. By “dependency” I mean those items of information and/or software that are required to execute the recipe — the ingredients and utensils, to use the cooking analogy. Note that by “execution” I do not necessarily mean execution by a computer — steps in a recipe might well be entirely manual.

If I knew what these dependencies were, I could then compare them with the knowledge and software currently held by the designated community (DC), and decide which of the dependencies also need to be preserved.

I could also design a system which computes any “gaps” that arise between the knowledge and software held by the designated community and those required for execution of the recipe. This is one of the goals of the CASPAR project.

Read the rest of this entry »

The OAIS Information Model Revisited — Part 2.

Previously, in The O.A.I.S. Information Model Revisited – Part 1, I explored the abstract notions of data, information, representation information and interpretation.

I found that the O.A.I.S. notion of interpretation makes most sense when viewed as an act or operation, taking data and representation information as input, yielding new information as output.

In this note, I’d like to explore these ideas further, and see how they related to some real world examples of digital preservation.

Recipes” for Interpreting Archived Data

In particular, I’m interested in the “recipes” that tell you how to convert a sequence of bits into something more useful.

This is a fundamental requirement for any preservation archive – when retrieving an archived information item, you need the bits that encode that information, but you also need to know how to turn those bits into something else, something you can use.

The O.A.I.S. Information Model acknowledges this, by highlighting the need for representation information, but does it go far enough? Does the model really help us to understand the problems of reconstructing a useful artefact from an archived sequence of bits?

Read the rest of this entry »

The OAIS Information Model Revisited — Part 1

Introduction & Motivation

The Reference Model for an Open Archival Information System (OAIS) is an influential standard in the digital preservation domain. It contains an information model, which lays out some basic ideas about digital information, how it is encoded, interpreted and packaged. It also contains a functional model, which lays out the main functional components that should be present in a digital preservation system.

The CASPAR Project is currently designing and implementing software components for a distributed infrastructure to support digital preservation. The starting point for the design of these components is the OAIS reference model, and in particular, the OAIS information model.

This note captures some initial thoughts on the OAIS information model, working towards answers to the following questions:

  1. Does the OAIS Information Model make sense?
  2. Can it be used as the basis for designing software components, within a UML model-driven software engineering process?

Read the rest of this entry »

Zoological Case Studies in Digital Curation – DCC SCARP / ImageStore

I’ve started work on a project investigating current practices in digital curation across a variety of scientific displines. The project is called SCARP, and falls within the remit of the UK’s Digital Curation Centre (DCC).

I’m working on a sub-project within SCARP called ImageStore, which has identified four case studies involving the curation of images, video and associated (meta)data used as primary objects within the scholarly work flow. It is entirely by coincidence that three of these case studies involve research groups at the Zoology Department at Oxford University, where I am now spending two days a week embedded in the Image Bioinformatics Research Group, and that my bachelors degree was in Zoology (at Cambridge) – it must be destiny 🙂

The ImageStore case studies are interesting – one involves videos of badgers and other species used as part of wildlife conservation studies, another involves video of tool-making behaviour in crows (you wouldn’t believe what these crows can do), another involves images of in-situ gene expression in Drosophila (fruit fly) testes, and another involves electron micrographs and tomographs (3D pictures) of Trypanosomes (they cause sleeping sickness).

Read the rest of this entry »