Alistair Miles

Category: hacking

The OAIS. Information Model Revisited Part 3. Towards Models for Interpretation/Virtualisation Recipes

In this note, I begin to explore the use of the Eriksson-Penker UML extensions for business process modeling, as a tool for modeling the processes or work flows required to successfully interpret or virtualise a digital object.

Previously, in part 1 of this series, I explored the abstract notions of data, information, representation information and interpretation, as defined by the OAIS Information Model. In part 2, I tried to apply these notions to a simple example of a Web page. I found that we need to go beyond the OAIS Information Model if we want to capture and represent the “recipes” that take you from a sequence of bits to something more useful, in the general case where there may be multiple steps or stages required to process, virtualise or render a digital object.

Recipes and Dependencies

Take again the example from part 2 of a simple Web page, encoded as an XHTML 1.0 Transitional document using the UTF-8 character set, and stored as a single sequence of bits.

I’m interested in modeling the “recipe” that tells me how to turn the encoded sequence of bits back into a Web page, because this recipe will define the “dependencies” for the preserved object. By “dependency” I mean those items of information and/or software that are required to execute the recipe — the ingredients and utensils, to use the cooking analogy. Note that by “execution” I do not necessarily mean execution by a computer — steps in a recipe might well be entirely manual.

If I knew what these dependencies were, I could then compare them with the knowledge and software currently held by the designated community (DC), and decide which of the dependencies also need to be preserved.

I could also design a system which computes any “gaps” that arise between the knowledge and software held by the designated community and those required for execution of the recipe. This is one of the goals of the CASPAR project.

Read the rest of this entry »

Advertisements

The OAIS Information Model Revisited — Part 2.

Previously, in The O.A.I.S. Information Model Revisited – Part 1, I explored the abstract notions of data, information, representation information and interpretation.

I found that the O.A.I.S. notion of interpretation makes most sense when viewed as an act or operation, taking data and representation information as input, yielding new information as output.

In this note, I’d like to explore these ideas further, and see how they related to some real world examples of digital preservation.

Recipes” for Interpreting Archived Data

In particular, I’m interested in the “recipes” that tell you how to convert a sequence of bits into something more useful.

This is a fundamental requirement for any preservation archive – when retrieving an archived information item, you need the bits that encode that information, but you also need to know how to turn those bits into something else, something you can use.

The O.A.I.S. Information Model acknowledges this, by highlighting the need for representation information, but does it go far enough? Does the model really help us to understand the problems of reconstructing a useful artefact from an archived sequence of bits?

Read the rest of this entry »

Using UML 2 & Model Driven Architecture (MDA) Transforms to Generate a Persistence Layer for Java Web Applications and Web Services

Agile Development and Rapid Prototyping

I’m doing some rapid prototyping of Web applications and Web services in Java, and because I need to redesign often, I want the design and implementation processes to be as agile and lightweight as possible. I.e. each change I make to the design should take little or no effort to implement & deploy.

Persistent Data

The biggest problem I’ve encountered so far is persistence. I need to implement applications with persistent data. However, I’m coding (at least on the server side) in Java. Implementing persistence for Java applications typically requires an object-relational mapping, which if coded by hand, costs effort. There are alternatives for handling persistence more-or-less automatically, like J2EE’s annotations. However, I’m also very much in favour of coding with POJOs, injecting dependencies wherever possible, and decoupling application code from any implementation-specific considerations to do with persistence.

Data Access Objects (DAOs)

A popular design pattern for persistence is the use of Data Access Objects (DAOs) which abstract the data access and manipulation operations from the implementation-specific details, allowing the persistence platform to be swapped out without affecting application logic.

UML 2 & MDA Transforms

I’ve also been working recently with UML 2 and Model Driven Architecture (MDA) transforms. The tool I’ve been using is Enterprise Architect, which encourages the development of a Platform Independent Model (PIM), which can then be transformed into class models tailored to various programming languages (Java, C# etc.). A PIM can also be transformed into database models tailored to various database platforms (Oracle, Postgres, MySQL etc.). From these platform-specific models, you can then generate actual code or DDL scripts.

The Dream

So the dream is this: I design the data model for my application in a totally platform-independent way, using UML; I then use MDA transforms to generate a database model, a Java class model, and at least the interfaces for a set of Java DAOs (ideally the implementations as well); I then use code generation to automatically generate the DDL scripts and Java class/interface definitions.

I.e. I want to go from a PIM to a fully-implemented persistence layer at the press of one or two buttons.

Oh, and I don’t want to pay any money for any of this 🙂

This where I’ve got to so far…

Read the rest of this entry »

Using UML 2 & Model Driven Architecture (MDA) Transforms to Generate a Persistence Layer for Java Web Applications and Web Services

Agile Development and Rapid Prototyping

I’m doing some rapid prototyping of Web applications and Web services in Java, and because I need to redesign often, I want the design and implementation processes to be as agile and lightweight as possible. I.e. each change I make to the design should take little or no effort to implement & deploy.

Persistent Data

The biggest problem I’ve encountered so far is persistence. I need to implement applications with persistent data. However, I’m coding (at least on the server side) in Java. Implementing persistence for Java applications typically requires an object-relational mapping, which if coded by hand, costs effort. There are alternatives for handling persistence more-or-less automatically, like J2EE’s annotations. However, I’m also very much in favour of coding with POJOs, injecting dependencies wherever possible, and decoupling application code from any implementation-specific considerations to do with persistence.

Data Access Objects (DAOs)

A popular design pattern for persistence is the use of Data Access Objects (DAOs) which abstract the data access and manipulation operations from the implementation-specific details, allowing the persistence platform to be swapped out without affecting application logic.

UML 2 & MDA Transforms

I’ve also been working recently with UML 2 and Model Driven Architecture (MDA) transforms. The tool I’ve been using is Enterprise Architect, which encourages the development of a Platform Independent Model (PIM), which can then be transformed into class models tailored to various programming languages (Java, C# etc.). A PIM can also be transformed into database models tailored to various database platforms (Oracle, Postgres, MySQL etc.). From these platform-specific models, you can then generate actual code or DDL scripts.

The Dream

So the dream is this: I design the data model for my application in a totally platform-independent way, using UML; I then use MDA transforms to generate a database model, a Java class model, and at least the interfaces for a set of Java DAOs (ideally the implementations as well); I then use code generation to automatically generate the DDL scripts and Java class/interface definitions.

I.e. I want to go from a PIM to a fully-implemented persistence layer at the press of one or two buttons.

Oh, and I don’t want to pay any money for any of this 🙂

This where I’ve got to so far…

Read the rest of this entry »

The OAIS Information Model Revisited — Part 1

Introduction & Motivation

The Reference Model for an Open Archival Information System (OAIS) is an influential standard in the digital preservation domain. It contains an information model, which lays out some basic ideas about digital information, how it is encoded, interpreted and packaged. It also contains a functional model, which lays out the main functional components that should be present in a digital preservation system.

The CASPAR Project is currently designing and implementing software components for a distributed infrastructure to support digital preservation. The starting point for the design of these components is the OAIS reference model, and in particular, the OAIS information model.

This note captures some initial thoughts on the OAIS information model, working towards answers to the following questions:

  1. Does the OAIS Information Model make sense?
  2. Can it be used as the basis for designing software components, within a UML model-driven software engineering process?

Read the rest of this entry »

RDFOO – Convert RDF Graphs into JSON Objects

I’ve written a small Java utility for converting any node in an RDF graph into a JSON object. You can download RDFOO 0.1 alpha or alternatively go to the RDFOO web page which has links to more documentation.

RDFOO is (more or less) an implementation of JDIL, using Jena and the Java classes for JSON from json.org. Given a resource in an RDF graph, RDFOO by default performs a shallow mapping to a JSON object, capturing only literal value properties. RDFOO can also be told to follow specific properties to a given (or unlimited) depth, to capture nested objects, and handles circular references.

If you find any bugs or have any comments I’d love to hear from you. I think the implementation is sensible, and it passes a reasonably complete test case, but then I’m no Java guru – you have been warned 🙂

Trac Moin Python Postgresql Apache Red Hat Enterprise Linux Installation and Dependency Hell

I’ve recently been helping out on a software development project in my department. I decided to try out Trac – a software project management tool – to see if it could help us manage the development process. So I downloaded the latest stable release (0.10.4), and set about installing it on a server. This is what happened next…

Read the rest of this entry »

Outlook 2003 Email Plain Text Line Wrap 72 Characters Problem

I want to write email in plain text. Sometimes I want to include URLs and I don’t want those URLs to get broken by hard line wraps. I’m using Microsoft Outlook 2003 as my email client and my company uses Microsoft Exchange Server (2003 I think). In Outlook, I set Tools > Options > Mail Format > Internet Format > Automatically wrap text at 132 characters – this is the maximum allowed and should be fine for most URLs. When I sent emails to myself with long URLs it worked fine. However, when I sent emails to other people outside my company, all plain text emails were wrapped at 72 characters – so any URL over 72 characters was broken.

Read the rest of this entry »

Free UML Tools

I’m looking for a tool for building UML models visually. I want something that will export some sort of image format for documentation, some sort of machine readable format, and preferably something that will generate some Java classes from the model. I’d also like something that integrates with the Eclipse framework. Support for the latest version of UML would be good – I guess that’s UML 2.?. Oh, and of course, I want it all for free …

Read the rest of this entry »