Today was the first day of the workshop – back at the good old EBI, though it isn't as recognizable as it used to be. Sure, there is the new EBI extension, but I am used to that now. However, they're renovating the inside of the old EBI building as well, reducing many of my friends to portakabin living over the winter months: better them than me!
Today definitely had an emphasis on the "work" part of "workshop". While a large part of the work on the XSLT for converting between FuGE and ISA-TAB is complete, some of the slightly stickier areas of the conversion are still being worked on. We spent today on trying to iron out some of the difficulties that arise from trying to convert the sort of rich tree structure that you get from the XML implementation of FuGE (FuGE-ML) into the flatter tabular format of ISA-TAB. Below are some of the more general ideas that we were throwing around as a result. (Some are more directly related to the conversion process than others – but all raise interesting points to me.)
- One of the column names in the ISA-TAB Assay file is currently named "Raw Data File" in the 1.0 Specification. This caused a large amount of discussion as to what "raw" meant, and that many people would have a different idea of what a raw data file was. It was originally named this way to act as a foil against another (optional) column name, "Derived Data File". However, derived data files have a more precise definition in ISA-TAB – such a column can only be used to name files resulting from data transformations or processing. In the end, we are considering a name change, from "Raw Data File" to "Data File".
- In the end, there will be a few simple ways to format your FuGE-ML files in a way that will aid the conversion into ISA-TAB. It would be useful to eventually produce a set of guidelines to aid in interoperability.
- Some of the developers already using FuGE (myself included) are using the <Description> element within a FuGE-ML file as a way to allow our biologists to give a free-text description to both materials and data files. There is no specific element in these objects to add such information, and therefore the generic Description element is the best location. This isn't exactly as per FuGE best-practices, where the default Description elements are really only meant for private comments within a local FuGE implementation, and can normally be ignored by external bioinformaticians making use of your FuGE-ML. Such material and data descriptions can be copied into the ISA-TAB file as free text within the Comment columns, where what sits within the "" is the material or data identifier. We'll have to see if this idea turns out to be useful.
- The main challenge in collapsing FuGE-ML into ISA-TAB is ensuring that the multi-level protocol application structures (for more information, see the GenericProtocolApplication and GenericProtocol objects within the FuGE Object Model) are correctly converted. We spent the majority of today trying to figure out an elegant way of doing this. We'll work on it again tomorrow, and will hopefully have a new version of the XSLT with a first-bash solution tomorrow evening!