The more things change, the more they stay the same

…also known as Day 1 of the BBSRC Synthetic Biology Standards Workshop at Newcastle University, and musings arising from the day’s experiences.

In my relatively short career (approximately 12 years – wait, how long?) in bioinformatics, I have been involved to a greater or lesser degree in a number of standards efforts. It started in 1999 at the EBI, where I worked on the production of the protein sequence database UniProt. Now, I’m working with systems biology data and beginning to look into synthetic biology. I’ve been involved in the development (or maintenance) of a standard syntax for protein sequence data; standardized biological investigation semantics and syntax; standardized content for genomics and metagenomics information; and standardized systems biology modelling and simulation semantics.

(Bear with me – the reason for this wander through memory lane becomes apparent soon.)

How many standards have you worked on? How can there be multiple standards, and why do we insist on creating new ones? Doesn’t the definition of a standard mean that we would only need one? Not exactly. Take the field of systems biology as an example. Some people are interested in describing a mathematical model, but have no need for storing either the details of how to simulate that model or the results of multiple simulation runs. These are logically separate activities, yet they fall within a single community (systems biology) and are broadly connected. A model is used in a simulation, which then produces results. So, when building a standard, you end up with the same separation: have one standard for the modelling, another for describing a simulation, and a third for structuring the results of a simulation. All that information does not need to be stored in a single location all the time. The separation becomes even more clear when you move across fields.

But this isn’t completely clear cut. Some types of information overlap within standards of a single domain and even among domains, and this is where it gets interesting. Not only do you need a single community talking to each other about standard ways of doing things, but you also need cross-community participation. Such efforts result in even more high-level standards which many different communities can utilize. This is where work such as OBI and FuGE sit: with such standards, you can describe virtually any experiment. The interconnectedness of standards is a whole job (or jobs) in itself – just look at the BioSharing and MIBBI projects. And sometimes standards that seem (at least mostly) orthogonal do share a common ground. Just today, Oliver Ruebenacker posted some thoughts on the biopax-discuss mailing list where he suggests that at least some of BioPAX and SBML share a common ground and might be usefully “COMBINE“d more formally (yes, I’d like to go to COMBINE; no, I don’t think I’ll be able to this year!). (Scroll down that thread for a response by Nicolas Le Novère as to why that isn’t necessarily correct.) So, orthogonality, or the extent to which two or more standards overlap, is sometimes a hard thing to determine.

So, what have I learnt? As always, we must be practical. We should try to develop an elegant solution, but it really, really should be one which is easy to use and intuitive to understand. It’s hard to get to that point, especially as I think that point is (and should be) a moving target. From my perspective, group standards begin with islands of initial research in a field, which then gradually develop into a nascent community. As a field evolves, ‘just-enough’ strategies for storing and structuring data become ‘nowhere-near-enough’. Communication with your peers becomes more and more important, and it becomes imperative that standards are developed.

This may sound obvious, but the practicalities of creating a community standard means such work requires a large amount of effort and continued goodwill. Even with the best of intentions, with every participant working towards the same goal, it can take months – or years – of meetings, document revisions and conference calls to hash out a working standard. This isn’t necessarily a bad thing, though. All voices do need to be heard, and you cannot have a viable standard without input from the community you are creating that standard for. You can have the best structure or semantics in the world, but if it’s been developed without the input of others, you’ll find people strangely reluctant to use it.

Every time I take part in a new standard, I see others like me who have themselves been involved in the creation of standards. It’s refreshing and encouraging. Hopefully the time it takes to create standards will drop as the science community as a whole gets more used to the idea. When I started, the only real standards in biological data (at least that I had heard of) were the structures defined by SWISS-PROT and EMBL/GenBank/DDBJ. By the time I left the EBI in 2006, I could have given you a list a foot long (GO, PSI, and many others), and that list continues to grow. Community engagement and cross-community discussions continue to be popular.

In this context, I can now add synthetic biology standards to my list of standards I’ve been involved in. And, as much as I’ve seen new communities and new standards, I’ve also seen a large overlap in the standardization efforts and an even greater willingness for lots of different researchers to work together, even taking into account the sometimes violent disagreements I’ve witnessed! The more things change, the more they stay the same…

At this stage, it is just a limited involvement, but the BBSRC Synthetic Biology Standards Workshop I’m involved in today and tomorrow is a good place to start with synthetic biology. I describe most of today’s talks in this post, and will continue with another blog post tomorrow. Enjoy!

For those with less time, here is a single sentence for each talk that most resounded with me:

  1. Mike Cooling: Emphasising the ‘re’ in reusable, and make it easier to build and understand large models from reusable components.
  2. Neil Wipat: For a standard to be useful, it must be computationally amenable as well as useful for humans.
  3. Herbert Sauro: Currently there is no formal ontology for synthetic biology, but one will need to be developed.

This meeting is organized by Jen Hallinan and Neil Wipat of Newcastle University. Its purpose is to set up key relationships in the synthetic biology community to aid the development of a standard for that community. Today, I listened to talks by Mike Cooling, Neil Wipat, and Herbert Sauro. I was – unfortunately – unable to be present for the last couple of talks, but will be around again for the second – and final – day of the workshop tomorrow.

Mike Cooling – Bioengineering Institute Auckland, New Zealand

Mike uses CellML (it’s made where he works, but that’s not the only reason…) in his work with systems and synthetic biology models. Among other things, it wraps MathML and partitions the maths, variables and units into reusable pieces. Although many of the parts seem domain specific, CellML itself is actually not domain specific. Further, unlike other modelling languages such as SBML, components in CellML are reusable and can be imported into other models. (Yes, a new package called comp in SBML Level 3 is being created to allow the importing of models into other models, but it isn’t mature – yet.)

How are models stored? There is the CellML repository, but what is out there for synthetic biology? The Registry of Standard Biological Parts was available, but only described physical parts. Therefore they created a Registry of Standard Virtual Parts (SVPs) to complement the original registry. This was developed as a group effort with a number of people including Neil Wipat and Goksel Misirli at Newcastle University.

They start with template mathematical structures (which are little parts of CellML), and then use the import functionality available as part of CellML to combine the templates into larger physical things/processes (‘SVPs’) and ultimately to combine things into system models.

They extended the CellMLRepository to hold the resulting larger multi-file models, which included adding a method of distributed version control and allow the sharing of models between projects through embedded workspaces.

What can these pieces be used for? Some of this work included the creation of a CellML model of the biology represented in Levskaya et al. 2005 and deposit all of the pieces of the model in the CellML repository. Another example is a model he’s working on about shear stress and multi-scale modelling for aneurysms.

Modules are being used and are growing in number, which is great, but he wants to concentrate more at the moment on the ‘re’ of the reusable goal, and make it easier to build and understand large models from reusable components. Some of the integrated services he’d like to have: search and retrieval, (semi-automated) visualization, semantically-meaningful metadata and annotations, and semi-automated composition.

All this work above converges on the importance of metadata. With the CellML Metadata Framework 1.0, not many used it. With version 2.0 they have developed a core specification with is very simple and then provide many additional satellite specifications. For example, there is a biological information satellite, where you use the biomodels qualifiers as relationships between your data and MIRIAM URNs. The main challenge is to find a database that is at the right level of abstraction (e.g. canonical forms of your concept of interest).

Neil Wipat – Newcastle University

Please note Neil Wipat is my PhD supervisor.

Speaking about data standards, tool interoperability, data integration and synthetic biology, a.k.a “Why we need standards”. They would like to promote interoperability and data exchange between their own tools (important!) as well as other tools. They’d also like to facilitate data integration to inform the design of biological systems both from a manual designer’s perspective and from the POV of what is necessary for computational tool use. They’d also like to enable the iterative exchange of data and experimental protocols in the synthetic biology life cycle.

A description of some of the tools developed in Neil’s group (and elsewhere) exemplify the differences in data structures present within synthetic biology. BacilloBricks was created to help get, filter and understand the information from the MIT registry of standard parts. They also created the Repository of Standard Virtual Biological Parts. This SVP repository was then extended with parts from Bacillus and was extended to make use of SBML as well as CellML. This project is called BacilloBricks Virtual. All of these tools use different formats.

It’s great having a database of SVPs, but you need a way of accessing and utilizing the database. Hallinan and Wipat have started a collaboration with Microsoft Research with the people who created a programming language for genetic engineering of living cells called the genetic engineering of cells (GEC) simulator. Some work a summer student did created a GEC compiler for SVPs from BacilloBricks virtual. Goksel has also created the MoSeC system where you can automatically go from a model to a graph to a EMBL file.

They also have BacillusRegNet, which is an information repository about transcription factors for Bacillus spp. It is also a source of orthogonal transcription factors for use in B. subtilis and Geobacillus. Again, it is very important to allow these tools to communicate efficiently.

The data warehouse they’re using is ONDEX. They feed information from the ONDEX data store to the biological parts database. ONDEX was created for systems biology to combine large experimental datasets. ONDEX views everything as a network, and is therefore a graph-based data warehouse. ONDEX has a “mini-ontology” to describe the nodes and edges within it, which makes querying the data (and understanding how the data is structured) much easier. However, it doesn’t include any information about the synthetic biology side of things. Ultimately, they’d like an integrated knowledgebase using ONDEX to provide information about biological virtual parts. Therefore they need a rich data model for synthetic biology data integration (perhaps including an RDF triplestore).

Interoperabiligy, Design and Automation: why we need standards.

Requirement 1. There needs to be interoperability and data exchange among these tools as well as among these tools and other external tools. Requirement 2. Standards for data integration aid the design of synthetic systems. The format must be both computationally amenable and useful for humans. Requirement 3. Automation of the design and characterization of synthetic systems, and this also requires standards.

The requirements of synthetic biology research labs such as Neil Wipat’s make it clear that standards are needed.

KEYNOTE: Herbert Sauro – University of Washington, US

Herbert Sauro described the developing community within synthetic biology, the work on standards that has already begun, and the Synthetic Biology Open Language (SBOL).

He asks us to remember that Synthetic Biology is not biology – it’s engineering! Beware of sending synthetic biology grant proposals to a biology panel! It is a workflow of design-build-test. He’s mainly interested in the bit between building and testing, where verification and debugging happens.

What’s so important about standards? It’s critical in engineering, where if increases productivity and lowers costs. In order to identify the requirement you must describe a need. There is one immediate need: store everything you need to reconstruct an experiment within a paper (for more on this see the Nature Biotech paper by Peccoud et al. 2011: Essential information for synthetic DNA sequences). Currently, it’s almost impossible to reconstruct a synthetic biology experiment from a paper.

There are many areas requiring standards to support the synthetic biology workflow: assembly, design, distributed repositories, laboratory parts management, and simulation/analysis. From a practical POV, the standards effort needs to allow researchers to electronically exchange designs with round tripping, and much more.

The standardization effort for synthetic biology began with a grant from Microsoft in 2008 and the first meeting was in Seattle. The first draft proposal was called PoBoL but was renamed to SBOL. It is a largely unfunded project. In this way, it is very similar to other standardization projects such as OBI.

DARPA mandated 2 weeks ago that all projects funded from Living Foundries must use SBOL.

SBOL is involved in the specification, design and build part of the synthetic biology life cycle (but not in the analysis stage). There are a lot of tools and information resources in the community where communication is desperately needed.

SBOL Semantic, SBOL Visual, and SBOL Script. SBOL Semantic is the one that’s going to be doing all of the exchange between people and tools. SBOL Visual is a controlled vocabulary and symbols for sequence features.

Have you been able to learn anything from SBML/SBGN, as you have a foot in both worlds? SBGN doesn’t address any of the genetic side, and is pretty complicated. You ideally want a very minimalistic design. SBOL semantic is written in UML and is relatively small, though has taken three years to get to this point. But you need host context above and beyond what’s modelled in SBOL Semantic. Without it, you cannot recreate the experiment.

Feature types such as operator sites, promoter sites, terminators, restriction sites etc can go into the sequence ontology (SO). The SO people are quite happy to add these things into their ontology.

SBOLr is a web front end for a knowledgebase of standard biological parts that they used for testing (not publicly accessible yet). TinkerCell is a drag and drop CAD tool for design and simulation. There is a lot of semantic information underneath to determine what is/isn’t possible, though there is no formal ontology. However, you can semantically-annotate all parts within TinkerCell, allowing the plugins to interpret a given design. A TinkerCell model can be composed of sub-models. Makes it easy to swap in new bits of models to see what happens.

WikiDust is a TinkerCell plugin written in Python which searches SBPkb for design components, and ultimately uploads them to a wiki. LibSBOLj is a library for developers to help them connect software to SBOL.

The physical and host context must be modelled to make all of this useful. By using semantic web standards, SBOL becomes extensible.

Currently there is no formal ontology for synthetic biology but one will need to be developed.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!


Modelling biomedical experimental processes with OBI (ISMB Bio-Ont SIG 2009)

Larisa Soldatova et al.

OBI was created to meet the need for a standardised vocabulary for experiments that can be shared across many experiment types. OBI is community driven, with over 19 communities participating. It is a candidate OBO Foundry ontology, is complementary to existing bio-ontologies, and reuses existing ontologies where possible. It uses various ULOs for interoperability: BFO, RO, and IAO. material_entity class was introduced into BFO on request of the OBI developers, for instance.

OBI uses relations from BFO, RO, and IAO as well as creating relations specific to OBI. OBI relations could be merged with other relations ontologies in future. They try to have as few relations as possible. Two use cases were outlined in this paper. Firstly, analyte measuring assay, where you draw blood from a mouse and determine the concentration of glucose in it. Use case 2 was a vaccine protection study, where you measure how efficiently a vaccine induces protection against virulent pathogen infection in vivo.

Allyson’s thoughts: Disclosure: I am involved in the development of OBI.

FriendFeed Discussion:

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

“Blogging is Hard” Day: Repost of 2006 FuGO Workshop Day 1

According to the rules set down by Greg Laden over at Science Blogs, I have had a trawl through the blasts from the pasts that was my 18 months or older blog posts to find one that is “exactly in lie [sic] with the writing or research in which they are currently engaged”. I thought about my Visiting With Enigma post, which has a special place in my heart, but didn’t choose it in the end as it didn’t have anything to do with my current research. Instead, I ended up choosing my very first post on WordPress: FuGO Workshop Day 1. It may not sound like much, but there are a number of things recommending this particular post.

  1. FuGO was the original name for the OBI project, of which I’m still a part and therefore it fits with the requirement that I still am involved.
  2. This was my first introduction to ontologies, and happened just as I was leaving one job (at the EBI) and starting a new one (at CISBAN). Such an important change deserves another mention.
  3. I notice an earlier incarnation of my “be sensible” statement in this post, where I say that I learned from Richard Scheuerman that it is always a good idea to use “only those fields which would be of most use to the biologist, rather than those that would make us bioinformaticians most happy”.
  4. FuGO wasn’t the only thing that has since undergone a name change. This post also contained information about the “new” MIcheck registry of minimal checklists: this has continued to gain in popularity, and is now MIBBI.
  5. Just last week at the CBO workshop, and again in a short discussion on FriendFeed that led to longer real-life conversations (Phillip Lord’s paper that deals with this topic), there was a long discussion at the FuGO workshop about Multiple versus Single inheritance in ontologies. This was also my first introduction to Robert Stevens and Barry Smith, who both took center stage in the MI/SI discussion. Listening to Barry and Robert speak was really informative and interesting and fun!

What a fantastic day that was: a crash course in ontology development and best-practices, as well as introductions to some of the most well-known people in the biological / biomedical ontology world. In many ways, those first few days of my current job / last few days of my old job shaped where I am now.

Read that entire post, and Happy Blogging is Hard Day! Thanks to Greg Laden for the great idea.

One way for RDF to help a bioinformatician build a database: S3DB

This post is part of the PLoS One syncroblogging day, as part of the PLoS ONE @ Two birthday celebrations. Happy Synchroblogging! Here’s a link to the paper on the PLoS One website.

Biological data: vitally important, determinedly unruly. This challenge facing the life-science community has been present for decades, as witnessed by the often exponential growth of biological databases (see the classic curve in the current graphs of UniProt1 and EMBL if you don’t believe me). It’s important to me, as a bioinformatics researcher whose main focus is semantic data integration, but it should be important to everyone. Without manageable data that can be easily integrated, all of our work suffers. Nature thinks it’s important: it recently devoted an entire issue to Big Data. Similarly, the Journal of Biomedical Informatics just had a Semantic Mashup special issue. Deus et al. (the paper I’m blogging about, published in PLoS One this summer) agree, beginning with “Data, data everywhere”, nicely encapsulating both the joy and the challenge in one sentence.

This paper describes work on a distributed management system that can link disparate data sources using methodologies commonly associated with the semantic web (or is that Web 3.0?). I’m a little concerned (not at the paper, just in general) at the fact that we seem to already have a 3.0 version of the web, especially as I have yet to figure out a useful definition for semantic web vs Web 2.0 vs Web 3.0. Definitions of Web 3.0 seems to vary wildly: is it the semantic web? Is it the -rwx- to Web 1.0’s -r– and Web 2.0’s -rw– (as described here, and cited below)? Are these two definitions one and the same? Perhaps these are discussions for another day… Ultimately, however, I have to agree with the authors that “Web 3.0” is an unimaginative designation2.

So, how can the semantic web help manage our data? That would be a post in itself, and is the focus of many PhD projects (including mine). Perhaps a better question is how does the management model proposed by Deus et al. use the semantic web, and is it a useful example of integrative bioinformatics?

Their introduction focuses on two types of integration: data integration as an aid to holistic approaches such as mathematical modelling, and software integration which could provide tighter interoperability between data and services. They espouse (and I agree) the semantic web as a technology which will allow the semantically-meaningful specification of desired properties of data in a search, rather than retrieving data in a fixed way from fixed locations. They want to extend semantic data integration from the world of bioinformatics into clinical applications. Indeed, they want to move past “clandestine and inefficient flurry of datasets exchanged as spreadsheets through email”, a laudable goal.

Their focus is on a common data management and analysis infrastructure that does not place any restrictions on the data stored. This also means multiple instances of light-weight applications are part of the model, rather than a single central application. The storage format is of a more general, flexible nature. Their way of getting the data into a common format, they say, is to break down the “interoperable elements” of the data structures into RDF triples (subject-predicate-object statements). At its most basic, their data structure has two types of triples: Rules and Statements. Rules are phrases like “sky has_color”, while statements add a value to the phrase, e.g. “today’s_sky has_color blue”.

They make the interesting point that the reclassification of data from flat files to XML to RDF to Description Logics starts to dilute “the distinction between data management and data analysis”. While it is true that if you are able to store your data in formats such as OWL-DL3, the format is much more amenable to direct computational reasoning and inference, perhaps a more precise statement would be that the distinction between performance of data management tasks and data analysis tasks will blur with richer semantic descriptions of both the data and their applications. As they say later in the paper, once the data and the applications are described in a way that is meaningful for computation, new data being deposited online could automatically trigger a series of appropriate analysis steps without any human input.

A large focus of the paper was on identity, both of the people using it (and therefore addressing the user requirement of a strong permissions system) and of the entities in the model and database (each identified with some type of URI). This theme is core to ensuring that only those with the correct permissions may access possibly-sensitive data, and that each item of information can be unambiguously defined. I like that the sharing of “permissions between data elements in distinct S3DB deployments happens through the sharing the membership in external Collections and Rules…not through extending the permission inheritance beyond the local deployment”. It seems a useful and straightforward method of passing permissions.

I enjoyed the introduction, background, and conclusions. Their description of the Semantic Web and how it could be employed in the life sciences is well-written and useful for newcomers to this area of research. Their description of the management model as composed of subject-predicate-object RDF triples plus membership and access layers was interesting. Their website was clear and clean, and they had a demo that worked even when I was on the train4. It’s also rather charming that “S3DB” stands for Simple Sloppy Semantic Database – they have to get points for that one5! However, the description of their S3DB prototype was not extensive, and as a result I have some basic questions, which can be summarized as follows:

  • How do they determine what the interoperable elements of different data structures are? Manually? Computationally? Is this methodology generic, or does it have to be done with each new data type?
  • The determination of the maturity of a data format is not described, other than that it should be a “stable representation which remains useful to specialized tools”. For instance, the mzXML format is considered mature enough to use as the object of an RDF triple. What quality control is there in such cases: in theory, someone could make a bad mzXML file. Or is it not the format which is considered mature, but instead specific data sets that are known to be high quality?
  • I would have like to have seen more detail in their practical example. Their user testing was performed together with the Lung Cancer SPORE user community. How long did the trial last? Was there some qualitative measurement of how happy they were with it (e.g. a questionnaire)? The only requirement gathered seems to have been that of high-quality access control.
  • Putting information into RDF statements and rules in an unregulated way will not guarantee a data sets that can be integrated with other S3DB implementations, even if they are of the same experiment type. This problem is exemplified by a quote from the paper (p. 8): “The distinct domains are therefore integrated in an interoperable framework in spite of the fact that they are maintained, and regularly edited, by different communities of researchers.” The framework might be identical, but that doesn’t ensure that people will use the same terms and share the same rules and statements. Different communities could build different statements and rules, and use different terms to describe the same concept. Distributed implementations of S3DB databases, where each group can build their own data descriptions, do not lend themselves well to later integration unless they start by sharing the same ontology/terms and core rules. And, as the authors encourage the “incubation of experimental ontologies” within the S3DB framework, chances are that there will be multiple terms describing the same concept, or even one word that has multiple definitions in different implementations. While they state that data elements can be shared across implementations, it isn’t a requirement and could lead to the problems mentioned. I have the feeling I may have gotten the wrong end of the stick here, and it would be great to hear if I’ve gotten something wrong.
  • Their use of the rdfs:subClassOf relation is not ideal. A subclass relation is a bit like saying “is a”, (defined here as a transitive property where “all the instances of one class are instances of another”) therefore what their core model is saying with the statement “User rdfs:subClassOf Group” is “User is a Group”. The same thing happens with the other uses of this relation, e.g. Item is a Collection.  A user is not a group, in the same way that a single item is not a collection. There are relations between these classes of object, but rdfs:subClassOf is simply not semantically correct. A SKOS relation such as skos:narrower (defined here as “used to assert a direct hierarchical link between two SKOS concepts”) would be more suitable, if they wished to use a “standard” relationship. I particularly feel that I probably misinterpreted this section of their paper, but couldn’t immediately find any extra information on their website. I would really like to hear if I’ve gotten something wrong here, too.

Also, although this is not something that should have been included in the paper, I would be curious to discover what use they think they could make of OBI, which would seem to suit them very well6. An ontology for biological and biomedical investigations would seem a boon to them. Further, such a connection could be two-way: the S3DB people probably have a large number of terms, gathered from the various users who created terms to use within the system. It would be great to work with the S3DB people to add these to the OBI ontology. Let’s talk! 🙂

Thanks for an interesting read!

1. Yes, I’ve mentioned to the UniProt gang that they need to re-jig their axes in the first graph in this link. They’re aware of it! 🙂
2. Although I shouldn’t talk, I am horrible at naming things, as the title of this blog shows
3. A format for ontologies using Description Logics that may be saved as RDF. See the official OWL docs.
4. Which is a really flaky connection, believe me!
5. Note that this expanded acronym is *not* present in this PloS One paper, but is on their website.
6. Note on personal bias: I am one of the core developers of OBI 🙂

Helena F. Deus, Romesh Stanislaus, Diogo F. Veiga, Carmen Behrens, Ignacio I. Wistuba, John D. Minna, Harold R. Garner, Stephen G. Swisher, Jack A. Roth, Arlene M. Correa, Bradley Broom, Kevin Coombes, Allen Chang, Lynn H. Vogel, Jonas S. Almeida (2008). A Semantic Web Management Model for Integrative Biomedical Informatics PLoS ONE, 3 (8) DOI: 10.1371/journal.pone.0002946
Z. Zhang, K.-H. Cheung, J. P. Townsend (2008). Bringing Web 2.0 to bioinformatics Briefings in Bioinformatics DOI: 10.1093/bib/bbn041

1st RSBI Workshop, 6-8 December 2007

Last week I attended the first RSBI (Reporting Structure for Biological Investigations) Workshop, carrying with me a multitude of hats. RSBI is a working group committed to the progression of standardization in multi-omics investigations. The purpose of the workshop was to examine and offer suggestions on the initial draft of ISA-TAB (more on that in a moment).

My first hat was a FuGE-user's hat, as the triumvirate of standards upon which RSBI is built is the Functional Genomics Experiment Model (FuGE), the Minimum Information for Biological and Biomedical Investigations (MIBBI) Project, and the Ontology for Biomedical Investigations (OBI). I was asked to give a current status update on FuGE itself, and on any communities that have already built extensions to FuGE. Andy Jones from Liverpool provided me with all of the hot-off-the-press information (my FuGE slides) – thanks Andy!

My second hat was a SyMBA-developer's hat. SyMBA uses FuGE to build a database and web front-end for storing data and experimental metadata. We use it in-house to store all of our large, high-throughput 'omics data. The use of FuGE in the system made it relevant for the workshop (my SyMBA slides, more SyMBA slides).

My final hat was a CISBAN-employee's hat. I work in the Wipat group there, and CISBAN is one of the "leading groups" involved in RSBI. As such, I was CISBAN's representative to the workshop.

The reason for the workshop, as stated earlier, was the evaluation of ISA-TAB, a proposed tabular format whose purpose is to provide a standard format for data and metadata submission into the formative BioMAP database at the EBI. ISA-TAB would have two uses:

  1. Humans: As a tabular format, it is quite easy for people to view and manipulate such templates within spreadsheet software such as Excel.
  2. Computers: As an interim solution only, ISA-TAB would be used as a computational exchange format until such time as each of the FuGE-based community extensions are complete for Metabolomics, Proteomics, and Transcriptomics. At this time, ISA-TAB would remain available for human use, but there would be a conversion step into "FuGE-ML".

The scope for ISA-TAB is large, and this was reflected in the attendees of the meeting. Representatives from ArrayExpress, Pride, and BioMAP were of course present, but also attending were people from the Metabolomics community, the MIACA project, toxico- and environmental genomics, and the FDA's NCTR.

A full write-up of the results of the workshop will soon be available online at the project's RSBI Google Group, so I'll leave it there. It was an exciting meeting, with fantastic food and even better discussions on getting public databases organized quickly for simple, straightforward multi-omics investigation data and metadata submission.

You can contact the RSBI via

Read and post comments |
Send to a friend


Summer 2007 OBI Ontology Workshop, Day 4

Once again, the best way to view these highly discussion-centric notes is via the combined notes of me and Helen, which you can find on the OBI Wiki. Enjoy! We got through lots of agenda items, and also made a dozen or so milestones, which are now up on the OBI Google Calendar, as well as in the meeting notes. These milestones will eventually be put up on the official Milestones page.

Read and post comments |
Send to a friend