Zoological Case Studies in Digital Curation – DCC SCARP / ImageStore

by Alistair Miles

I’ve started work on a project investigating current practices in digital curation across a variety of scientific displines. The project is called SCARP, and falls within the remit of the UK’s Digital Curation Centre (DCC).

I’m working on a sub-project within SCARP called ImageStore, which has identified four case studies involving the curation of images, video and associated (meta)data used as primary objects within the scholarly work flow. It is entirely by coincidence that three of these case studies involve research groups at the Zoology Department at Oxford University, where I am now spending two days a week embedded in the Image Bioinformatics Research Group, and that my bachelors degree was in Zoology (at Cambridge) – it must be destiny 🙂

The ImageStore case studies are interesting – one involves videos of badgers and other species used as part of wildlife conservation studies, another involves video of tool-making behaviour in crows (you wouldn’t believe what these crows can do), another involves images of in-situ gene expression in Drosophila (fruit fly) testes, and another involves electron micrographs and tomographs (3D pictures) of Trypanosomes (they cause sleeping sickness).

We’ve started to figure out how we want to approach each case study, which is partly written up in the ImageStore Project Plan. I’ve also given a couple of presentations on the project so far, such as this presentation to the Defining Image Access project final meeting.

The big question for me remains, do we want to focus on methods for post hoc preservation of the images, video and associated data we’ve found in each case study, or do we want to focus on strategies for quietly integrating curation support into the scientific work flow so that future curation activities are much easier and cheaper?

If we focus on post hoc preservation, we will provide a plan and feasibility assessment for a project to preserve specific collections of digital artefacts, which could be used to seek funding for specific preservation activities, and as an example for others seeking to do the same. This is of course a valuable excercise, because there are vast amounts of scientific data which are of potential value and worthy of preservation, and which are currently under very dubious (often non-existent) curation regimes (and therefore virtually impossible to re-use and at high risk of deterioration or loss).

However, post hoc preservation is likely to be expensive and time consuming, and we can only realistically hope to preserve a fraction of the current body of scientific assets in this way. Moreover, there is only a limited precedent for seeking funding for these types of activity.

If we focus instead on defining strategies for quietly integrating curation support into the scientific work flow, we may hope to make a far greater proportion of the digital assets being created today available into the future. We’ve tentatively dubbed this approach “sheer curation” – “sheer” as in lightweight and virtually transparent. The key hypothesis is that good data and digital asset management at local levels is also good practice in preparing for publication and/or preservation of data and other digital assets. If this is true, it may be possible to provide tools and recommendations of good practice that add immediate value to scientists in their everyday scholarly work, and which provide at least a significant step along a sequence of curation activities. The additional effort required to preserve these assets for the long term is then significantly reduced, and can be undertaken at departmental and institutional levels, without interfering with the scientists’ schedule.

The basic constraint is that most scientists only have time for their immediate, short-term research programme, and are not able (despite the best of philanthropic intentions) to spend any effort on preserving their assets for others.