HL53: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project (ISMB 2009)

Chris Taylor

Standards are hugely dependent on their respective communities for reqs gathering, develppment, testing, uptake by stakeholders. In modeling the biosciences there are are a few generic features such as description of source material and experimental design components. Then there are biologically-delineated and technologically-delineated views of the world. These views are still common across many different areas of the life sciences. Much of it can fall under an ISA (Investigation-Study-Assay) structure.

You should then use three types of standards: syntax (images of FuGE, ISA-TAB etc), semantics, and scope. MIBBI is all about scope. How well are things working? Well, there is still separation, but things are getting better. There aren’t many carrots, though there are some sticks for using these standards. Why do we care about standards? Data exchange, comprehensibility, and scope for reuse.  Many funders (esp public funders) are now requiring data sharing or ability for data storage and exchange.

“Metaprojects”: FuGE, OBI, ISA-TAB – draw together many different domains and present in structure/semantics useful across all. Many of the “MI” (Minimum information guidelines) are developed independently, and are sometimes defunct. It’s also hard to track what’s going on in these projects, can be redundant, difficult to obtain an overview of the full range of checklists. When the MI projects overlap, arbitrary decisions on wording and substructuring make integration difficult. This makes it hard to take parts of different guidelines – not very modular. Enter MIBBI. Two distinct goals: portal (registry of guidelines) and foundry (integration and modularization).

There’s lots of enthusiasm for the project (drafters, users, funders, journals). MIBBI raises awareness of various checklists and promotes gradual integration of checklists. Nature Biotechnology 26, 889 – 896 (2008) doi:10.1038/nbt0808-889 for the paper. He’s performed clustering and analysis of the different guidelines: displayed MIs in cytoscape and in fake phylogenetic tree. By the end of the year they’ll have a shopping-basked based tool, MICheckout, to get all concepts together and then you get your own specialized checklist as output. You can make use of isacreator and its configuration to set mandatory parameters etc.

The objections to fuller reporting. Why should I share? funders and publishers are starting to require a bare minimum of metadata – and researchers will just do the bare minimum then, however. Some people think that this is just a ‘make work’ scheme for bioinformaticians, or that bioinformaticians are parasitic. Some people don’t trust what others have done, but then that’s what the reporting guidelines are for in the first place – so you can figure out if you should trust it. Problems of quality are justified to an extent, but what of people lacking resource for large-scale work, or people who want to refer to proteomics data but don’t do proteomics? How should they follow theese guidelines? Perception is that there is no money for this, and no mature free tools, and worries about vendor support. Vendors will support what researchers say they need.

Credit: data sharing is more or less a given now, and need central registries of data sets that can record reuse (also openids, DOIs for data). Side benefits and challenges include clearing up problems with paper authorship wrt reporting who’s done which bit. Would also enable other kinds of credit, and may have to be self-policing. Finally, the problem of micro data sets and legacy data. Example of the former is EMBL entries – when searching against EMBL, you’re using the data in some way, even if you don’t pull it out for later analysis.

http://www.mibbi.org

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

TT26: BioModels Database, a database of curated and annotated quantitative models with Web Services and analysis tools (ISMB 2009)

Nicolas Le Novère

Lots of things are called models. He’s NOT going to talk about HMM, Bayesian models, sailboat models, supermodels 🙂 For him, a model is computer-readable, simulatable, and covers biological pathways. Models and their description/metadata need to be accessible. The models in BioModels are from peer-reviewed literature. THey check the model is OK and simulate them before accepting it into the database. Models can be either submitted by curators themselves (e.g. re-implemented from literature), or directly submitted by authors, or a few other ways.

Models also have to be encoded in SBML and follow the MIRIAM guidelines, which are reporting guidelines for the encoding and annotation of models, and is limited at the moment to models that can be quantitatively evaluated. There are seven basic requirements for MIRIAM compliance, which are available online. Within the model, MIRIAM annotations are identified by URIs and are stored as RDF. There’s been a steady increase in the numbers of models in BioModels. There are about 35000 reactions and about 400 models. Standard search functionality available from their website at the EBI (http://www.ebi.ac.uk/biomodels).

Can export in CellML, BioPAX and others (though the SBML is the curated, perhaps more “trusted”, version). There are also two simple simulators available directly from the entry’s webpage, and if you want to change parameters you can click through to JWS online. You can also just extract portions of the models: these will end up as valid SBML models in their own right.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Modelling biomedical experimental processes with OBI (ISMB Bio-Ont SIG 2009)

Larisa Soldatova et al.

OBI was created to meet the need for a standardised vocabulary for experiments that can be shared across many experiment types. OBI is community driven, with over 19 communities participating. It is a candidate OBO Foundry ontology, is complementary to existing bio-ontologies, and reuses existing ontologies where possible. It uses various ULOs for interoperability: BFO, RO, and IAO. material_entity class was introduced into BFO on request of the OBI developers, for instance.

OBI uses relations from BFO, RO, and IAO as well as creating relations specific to OBI. OBI relations could be merged with other relations ontologies in future. They try to have as few relations as possible. Two use cases were outlined in this paper. Firstly, analyte measuring assay, where you draw blood from a mouse and determine the concentration of glucose in it. Use case 2 was a vaccine protection study, where you measure how efficiently a vaccine induces protection against virulent pathogen infection in vivo.

Allyson’s thoughts: Disclosure: I am involved in the development of OBI.

FriendFeed Discussion: http://ff.im/4xoIA

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Review of OBO Foundry Principles at the OBO Foundry Workshop 2009

After the recent posts (listed here) in the lead-up to the OBO Foundry workshop, Duncan Hull, Melanie Courtot, and Frank Gibson led a discussion about the current state of the OBO Foundry principles yesterday.

The results of the discussion can be found on the OBO Foundry Wiki page.  It looks like there was a really positive outcome for this section of the workshop, with a lot of good points being raised. I encourage you all to go to this page, and then scroll down to the section entitled “Review of OBO Foundry Principle – Duncan Hull, Frank Gibson, Melanie Courtot”.

Thanks to Susanna-Assunta Sansone for taking the fabulous notes for both days!

Rules or Checklist? Which would you prefer from the OBO Foundry?

[Update: Duncan’s written a call for comments on the OBO Foundry criteria on his blog. Also posting on this are Melanie and Frank. Take a look! Update 2: I should have called the 10 criteria “principles” rather than “rules”. My apologies. I think the title may be a little bit of a misnomer for the post. I’m not sure you need to choose between principles and checklists. It’s nice to have the “short and sweet” and the detailed.]

The OBO Foundry Workshop (OBO Foundry paper) is coming up this weekend, and Duncan Hull and I were talking about the 10 criteria the Foundry has for member ontologies. We had been wondering what sort of questions we would ask the OBO Foundry people if we wanted to see the 10 criteria “upgraded” to a minimal checklist for OBO Foundry ontologies in the style of MIBBI. As a result of that, here are my thoughts on each criterion. Perhaps some of these have been answered in mailing lists or elsewhere, but they’re not visible on the OBO Foundry site. Hopefully this post would be useful as a starting point for a discussion on more complete definitions and explanations for the minimal requirements of an OBO Foundry ontology.

Each criterion is reproduced in bold, with my opinions after in italicised text. For any further text present in the criteria list, please see the list page itself.

  1. The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.
    This is a license without a name or a strong structure. Is it a first attempt at an OBO-specific license? If so, it is too generic to be of much use. Alternatively, is it a requirements list for choosing an existing license? Or, as another option, are they suggesting that people choose their own licenses along these lines? I believe strongly that already-extant licenses should be used in biological research wherever possible. You can see a summary of a FriendFeed discussion and an email discussion with Science Commons in my blog post on Choosing a License for Your Ontology for my opinion on the subject.  Therefore I would suggest option 2, with the Foundry choosing an appropriate license (or shortlist of compatible licenses) as soon as they could.
  2. The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL.
    Firstly, I would like clarification of what “extensions of the [OBO] syntax” means. Secondly, just saying “OWL” as a syntax is too vague; there’s OWL-Full, OWL-DL, and OWL-Lite, to name a few. Are all acceptable, or is the most commonly-used (OWL-DL) the one they want people to use?
  3. The ontologies possesses a unique identifier space within the OBO Foundry.
    Aside from the (nitpicky) statement that it should be either “The ontologies possess” or “Each ontology possesses”, this is one of the most useful criteria. However, a little more detail would be useful here. What should come after the prefix? An underscore or some other dividing character? The rest of the identifier without a dividing character? Should the OBO Foundry assign a prefix to avoid confusion? By the way, a paper has just been published about the *naming* conventions for the OBO Foundry which is interesting. This isn’t the same thing as this criterion, which is about unique identifiers, but it’s still worth a read.
  4. The ontology provider has procedures for identifying distinct successive versions.
    A little vague, but that probably cannot be helped, as you probably don’t want to legislate the type of versioning that takes place with each ontology. Links out to GO’s procedures or OBI’s procedures might provide some ideas to people who don’t know what versioning to use.
  5. The ontology has a clearly specified and clearly delineated content.
    The “domain” of the ontology, used in the further description of this criterion, is a vague term. Yes, we all want orthogonality, but that is difficult to achieve in practice and a clearer description of how people can achieve it might be useful. How are two terms expressing the same concept in the different ontologies resolved? Via the mailing list? Is there an established procedure? It’s easy to say that no two terms should be covering the same concept, but harder to check. There’s been some recent papers in finding similar concepts within a single ontology (e.g. 10.1093/bioinformatics/btp195) might be applicable to multiple ontologies.
  6. The ontologies include textual definitions for all terms.
    Good point. It would also be nice to say formal logic statements for classes would be useful (but not required), as it might help ensure the internal consistency of Foundry ontologies.
  7. The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
    This says you have to define your relations “following the pattern” from the RO. Does this mean all your relations must be children of relations in RO, or just that you follow their style? Probably the latter, but this is unclear at the moment.
  8. The ontology is well documented.
    Definitely! But how? Where? In the ontology file? On a website? Does the OBO website provide the ability to have lots of documentation, or should it just be links out?
  9. The ontology has a plurality of independent users.
    I’m a bit of a failure here, as I don’t know what this means. I can think of at least 2-3 different ways of interpreting this. What are users in this context? What makes them independent? How can you tell what your users are?
  10. The ontology will be developed collaboratively with other OBO Foundry members.
    Great idea. But what if you can’t find anyone who wants to help? Does that mean you can’t develop? Again, perhaps this just means regular reviews of the developing ontology by other OBO members, but could be made clearer.

Most of these opinions don’t try to provide an answer, but instead just raise some questions that the attendees at this week’s workshop might like to have in their minds. If the OBO Foundry, which exists to “align ontology development efforts” doesn’t provide clear guidance, there is a risk that each member ontology would come up with their own answers, thus negating some of the benefits provided by their membership (quote from the Nature Biotech paper).

Have a great workshop – wish I had the time to attend this year!

Modeling and Managing Experimental Data Using FuGE

Want to share your umpteen multi-omics data sets and experimental protocols with one common format? Encourage collaboration! Speak a common language! Share your work! How, you might ask? With FuGE!

In 2007, FuGE version 1 was released (website, Nature Biotechnology paper). FuGE allows biologists and bioinformaticians to describe any life science experiment using a single format, making collaboration and repeatability of experiments easier and more efficient. However, if you wanted to start using FuGE, until now it was difficult to know where to start. Do you use FuGE as it stands? Do you create an extension of FuGE that specifically meets your needs? What do the developers of FuGE suggest when taking your first steps using it? This paper focuses on best practices for using FuGE to model and manage your experimental data. Read this paper, and you’ll be taking your first steps with confidence!

ResearchBlogging.org

Want to share your umpteen multi-omics data sets and experimental protocols with one common format? Encourage collaboration! Speak a common language! Share your work! How, you might ask? With FuGE, and this latest paper (citation at the end of the post) tells you how.

In 2007, FuGE version 1 was released (website, Nature Biotechnology paper). FuGE allows biologists and bioinformaticians to describe any life science experiment using a single format, making collaboration and repeatability of experiments easier and more efficient. However, if you wanted to start using FuGE, until now it was difficult to know where to start. Do you use FuGE as it stands? Do you create an extension of FuGE that specifically meets your needs? What do the developers of FuGE suggest when taking your first steps using it? This paper focuses on best practices for using FuGE to model and manage your experimental data. Read this paper, and you’ll be taking your first steps with confidence!

[Aside: Please note that I am one of the authors of this paper.]

What is FuGE? I’ll leave it to the authors to define:

The approach of the Functional Genomics Experiment (FuGE) model is different, in that it attempts to generalize the modeling constructs that are shared across many omics techniques. The model is designed for three purposes: (1) to represent basic laboratory workflows, (2) to supplement existing data formats with metadata to give them context within larger workflows, and (3) to facilitate the development of new technology-specific formats. To support (3), FuGE provides extension points where developers wishing to create a data format for a specific technique can add constraints or additional properties.

A number of groups have started using FuGE, including MGED, PSI (for GelML and AnalysisXML), MSI, flow cytometry, RNA interference and e-Neuroscience (full details in the paper). This paper helps you get a handle on how to use FuGE by presenting two running examples of capturing experimental metadata in the fields of flow cytometry and proteomics of flow cytometry and gel electrophoresis. Part of Figure 2 from the paper is shown on the right, and describes one section of the flow cytometry FuGE extension from FICCS.

The flow cytometry equipment created as subclasses of the FuGE equipment class.
The flow cytometry equipment created as subclasses of the FuGE equipment class.

FuGE covers many areas of experimental metadata including the investgations, the protocols, the materials and the data. The paper starts by describing how protocols are designed in FuGE and how those protocols are applied. In doing so, it describes not just the protocols but also parameterization, materials, data, conceptual molecules, and ontology usage.

Examples of each of these FuGE packages are provided in the form of either the flow cytometry or the GelML extensions. Further, clear scenarios are provided to help the user determine when it is best to extend FuGE and when it is best to re-use existing FuGE classes. For instance, it is best to extend the Protocol class with an application-specific subclass when all of the following are true: when you wish to describe a complex Protocol that references specific sub-protocols, when the Protocol must be linked to specific classes of Equipment or Software, and when specific types of Parameter must be captured. I refer you to the paper for scenarios for each of the other FuGE packages such as Material and Protocol Application.

The paper makes liberal use of UML diagrams to help you understand the relationship between the generic FuGE classes and the specific sub-classes generated by extensions. A large part of the paper is concerned expressly with helping the user understand how to model an experiment type using FuGE, and also to understand when FuGE on its own is enough. But it also does more than that: it discusses the current tools that are already available for developers wishing to use FuGE, and it discusses the applicability of other implementations of FuGE that might be useful but do not yet exist. Validation of FuGE-ML and the storage of version information within the format are also described. Implementations of FuGE, including SyMBA and sysFusion for the XML format and ISA-TAB for compatibility with a spreadsheet (tab-delimited) format, are also summarised.

I strongly believe that the best way to solve the challenges in data integration faced by the biological community is to constantly strive to simply use the same (or compatible) formats for data and for metadata. FuGE succeeds in providing a common format for experimental metadata that can be used in many different ways, and with many different levels of uptake. You don’t have to use one of the provided STKs in order to make use of FuGE: you can simply offer your data as a FuGE export in addition to any other omics formats you might use. You could also choose to accept FuGE files as input. No changes need to be made to the underlying infrastructure of a project in order to become FuGE compatible. Hopefully this paper will flatten the learning curve associated for developers, and get them on the road to a common format. Just one thing to remember: formats are not something that the end user should see. We developers do all this hard work, but if it works correctly, the biologist won’t know about all the underpinnings! Don’t sell your biologists on a common format by describing the intricacies of FuGE to them (unless they want to know!), just remind them of the benefits of a common metadata standard: cooperation, collaboration, and sharing.

Jones, A., Lister, A.L., Hermida, L., Wilkinson, P., Eisenacher, M., Belhajjame, K., Gibson, F., Lord, P., Pocock, M., Rosenfelder, H., Santoyo-Lopez, J., Wipat, A., & Paton, N. (2009). Modeling and Managing Experimental Data Using FuGE OMICS: A Journal of Integrative Biology, 2147483647-13 DOI: 10.1089/omi.2008.0080

“Blogging is Hard” Day: Repost of 2006 FuGO Workshop Day 1

According to the rules set down by Greg Laden over at Science Blogs, I have had a trawl through the blasts from the pasts that was my 18 months or older blog posts to find one that is “exactly in lie [sic] with the writing or research in which they are currently engaged”. I thought about my Visiting With Enigma post, which has a special place in my heart, but didn’t choose it in the end as it didn’t have anything to do with my current research. Instead, I ended up choosing my very first post on WordPress: FuGO Workshop Day 1. It may not sound like much, but there are a number of things recommending this particular post.

  1. FuGO was the original name for the OBI project, of which I’m still a part and therefore it fits with the requirement that I still am involved.
  2. This was my first introduction to ontologies, and happened just as I was leaving one job (at the EBI) and starting a new one (at CISBAN). Such an important change deserves another mention.
  3. I notice an earlier incarnation of my “be sensible” statement in this post, where I say that I learned from Richard Scheuerman that it is always a good idea to use “only those fields which would be of most use to the biologist, rather than those that would make us bioinformaticians most happy”.
  4. FuGO wasn’t the only thing that has since undergone a name change. This post also contained information about the “new” MIcheck registry of minimal checklists: this has continued to gain in popularity, and is now MIBBI.
  5. Just last week at the CBO workshop, and again in a short discussion on FriendFeed that led to longer real-life conversations (Phillip Lord’s paper that deals with this topic), there was a long discussion at the FuGO workshop about Multiple versus Single inheritance in ontologies. This was also my first introduction to Robert Stevens and Barry Smith, who both took center stage in the MI/SI discussion. Listening to Barry and Robert speak was really informative and interesting and fun!

What a fantastic day that was: a crash course in ontology development and best-practices, as well as introductions to some of the most well-known people in the biological / biomedical ontology world. In many ways, those first few days of my current job / last few days of my old job shaped where I am now.

Read that entire post, and Happy Blogging is Hard Day! Thanks to Greg Laden for the great idea.