North East Regional e-Science Centre/Digital
Curation Centre Collaborative Workshop was on today, the 5th of February and Newcastle University. The DCC's main role is to "support and promote continuing improvement in the quality of data curation and of associated digital preservation". The aim of the NEReSC is to identify, fund and support
high-quality projects with leading industrial and academic partners. The NEReSC was established in July 2001, funded by the
DTI through the UK Core e-Science
programme, to provide expertise in e-Science and to instigate and run a set
of industrially focused projects.
The first two speakers, Paul Watson and Liz Lyon, gave short introductions about their respective organizations. Paul Watson is the head of NEReSC, and Liz Lyon is the Associate Director for Community Development at the DCC.
Liz spoke of how the DCC are interested in seeing what work is being done at Newcastle University in the context of digital curation and preservation, and perhaps developing partnerships with like-minded projects at the University. The DCC has already held 2 conferences on the subject of digital curation, the last one being last November (2006) in Glasgow. At that conference they also launched the electronic journal "International Journal of Digital Curation". It is a good move, as curation and data preservation are can be difficult to publish on in the more standard biology journals.
Paul Watson outlined the incredible need of the scientific community to have reliable archives of published data. He mentioned his so-called "Bowker's Standard" Scientific Data Life-Cycle, which is less of a life-cycle and really more of a gradual tailing-off. Step one is collect data, step two is publish the data, and step 3 is to gradually loose the original data as machines get turned off and students leave for greener pastures. It is humorous, but does show a real problem in the life sciences. Data for published articles should be preserved: otherwise, it means published papers draw conclusions from unpublished data, other groups are unable to reproduce an experiment, and the data cannot be re-used.
After these introductory speeches, there were 3 talks from Newcastle researchers on projects that involve archiving and curation. First, I spoke on the CISBAN data management strategy, which included an introduction to the CISBAN Data Portal and Integrator (slides for the DPI are available through that link). Then, Paul Watson spoke again, this time on CARMEN. There are a multitude of neuroscience data (molecular, anatomical, neurophysiological, and behavioural to name just a few categories) in many different locations with a variety of restrictions on their publishing and availability. There are a few efforts underway to try to unify data formats and archiving, but it is difficult to overcome the cultural (multiple communities acting independently; concerns from researchers about the consequences of sharing data) as well as technical (multiple proprietary data formats; the great volume of data; the need for standarized detailed metadata) barriers. Hopefully CARMEN and sister efforts such as BIRN and Neuro Commons (via Scientific Commons I believe, but don't quote me on it!) will be able to make real strides in this area in the coming years. Then, Patrick Olivier spoke on his work at the Culture Lab, part of the Institute of Ageing and Health at Newcastle University. They research ways of having the humanities, social sciences and the arts inform and aid computing, and vice versa.
The afternoon was scheduled for presentations from the DCC and general discussion. Unfortunately, I had a prior engagement with another meeting, and had to bow out. However, there was lots of energy in the morning, with many people from both groups asking questions and getting involved. Digital curation, archiving and preservation is an area which every research group should be interested in. It is very easy to forget that, unless you have some sort of data policy in your group, chances are that the data sitting on your computer is JUST on your computer, and is therefore precariously stored indeed.