Review of OBO Foundry Principles at the OBO Foundry Workshop 2009

After the recent posts (listed here) in the lead-up to the OBO Foundry workshop, Duncan Hull, Melanie Courtot, and Frank Gibson led a discussion about the current state of the OBO Foundry principles yesterday.

The results of the discussion can be found on the OBO Foundry Wiki page.  It looks like there was a really positive outcome for this section of the workshop, with a lot of good points being raised. I encourage you all to go to this page, and then scroll down to the section entitled “Review of OBO Foundry Principle – Duncan Hull, Frank Gibson, Melanie Courtot”.

Thanks to Susanna-Assunta Sansone for taking the fabulous notes for both days!

Advertisements

Rules or Checklist? Which would you prefer from the OBO Foundry?

[Update: Duncan’s written a call for comments on the OBO Foundry criteria on his blog. Also posting on this are Melanie and Frank. Take a look! Update 2: I should have called the 10 criteria “principles” rather than “rules”. My apologies. I think the title may be a little bit of a misnomer for the post. I’m not sure you need to choose between principles and checklists. It’s nice to have the “short and sweet” and the detailed.]

The OBO Foundry Workshop (OBO Foundry paper) is coming up this weekend, and Duncan Hull and I were talking about the 10 criteria the Foundry has for member ontologies. We had been wondering what sort of questions we would ask the OBO Foundry people if we wanted to see the 10 criteria “upgraded” to a minimal checklist for OBO Foundry ontologies in the style of MIBBI. As a result of that, here are my thoughts on each criterion. Perhaps some of these have been answered in mailing lists or elsewhere, but they’re not visible on the OBO Foundry site. Hopefully this post would be useful as a starting point for a discussion on more complete definitions and explanations for the minimal requirements of an OBO Foundry ontology.

Each criterion is reproduced in bold, with my opinions after in italicised text. For any further text present in the criteria list, please see the list page itself.

  1. The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.
    This is a license without a name or a strong structure. Is it a first attempt at an OBO-specific license? If so, it is too generic to be of much use. Alternatively, is it a requirements list for choosing an existing license? Or, as another option, are they suggesting that people choose their own licenses along these lines? I believe strongly that already-extant licenses should be used in biological research wherever possible. You can see a summary of a FriendFeed discussion and an email discussion with Science Commons in my blog post on Choosing a License for Your Ontology for my opinion on the subject.  Therefore I would suggest option 2, with the Foundry choosing an appropriate license (or shortlist of compatible licenses) as soon as they could.
  2. The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL.
    Firstly, I would like clarification of what “extensions of the [OBO] syntax” means. Secondly, just saying “OWL” as a syntax is too vague; there’s OWL-Full, OWL-DL, and OWL-Lite, to name a few. Are all acceptable, or is the most commonly-used (OWL-DL) the one they want people to use?
  3. The ontologies possesses a unique identifier space within the OBO Foundry.
    Aside from the (nitpicky) statement that it should be either “The ontologies possess” or “Each ontology possesses”, this is one of the most useful criteria. However, a little more detail would be useful here. What should come after the prefix? An underscore or some other dividing character? The rest of the identifier without a dividing character? Should the OBO Foundry assign a prefix to avoid confusion? By the way, a paper has just been published about the *naming* conventions for the OBO Foundry which is interesting. This isn’t the same thing as this criterion, which is about unique identifiers, but it’s still worth a read.
  4. The ontology provider has procedures for identifying distinct successive versions.
    A little vague, but that probably cannot be helped, as you probably don’t want to legislate the type of versioning that takes place with each ontology. Links out to GO’s procedures or OBI’s procedures might provide some ideas to people who don’t know what versioning to use.
  5. The ontology has a clearly specified and clearly delineated content.
    The “domain” of the ontology, used in the further description of this criterion, is a vague term. Yes, we all want orthogonality, but that is difficult to achieve in practice and a clearer description of how people can achieve it might be useful. How are two terms expressing the same concept in the different ontologies resolved? Via the mailing list? Is there an established procedure? It’s easy to say that no two terms should be covering the same concept, but harder to check. There’s been some recent papers in finding similar concepts within a single ontology (e.g. 10.1093/bioinformatics/btp195) might be applicable to multiple ontologies.
  6. The ontologies include textual definitions for all terms.
    Good point. It would also be nice to say formal logic statements for classes would be useful (but not required), as it might help ensure the internal consistency of Foundry ontologies.
  7. The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
    This says you have to define your relations “following the pattern” from the RO. Does this mean all your relations must be children of relations in RO, or just that you follow their style? Probably the latter, but this is unclear at the moment.
  8. The ontology is well documented.
    Definitely! But how? Where? In the ontology file? On a website? Does the OBO website provide the ability to have lots of documentation, or should it just be links out?
  9. The ontology has a plurality of independent users.
    I’m a bit of a failure here, as I don’t know what this means. I can think of at least 2-3 different ways of interpreting this. What are users in this context? What makes them independent? How can you tell what your users are?
  10. The ontology will be developed collaboratively with other OBO Foundry members.
    Great idea. But what if you can’t find anyone who wants to help? Does that mean you can’t develop? Again, perhaps this just means regular reviews of the developing ontology by other OBO members, but could be made clearer.

Most of these opinions don’t try to provide an answer, but instead just raise some questions that the attendees at this week’s workshop might like to have in their minds. If the OBO Foundry, which exists to “align ontology development efforts” doesn’t provide clear guidance, there is a risk that each member ontology would come up with their own answers, thus negating some of the benefits provided by their membership (quote from the Nature Biotech paper).

Have a great workshop – wish I had the time to attend this year!

From OBO to OWL and Back Again: OBO capabilities of the OWL API

ResearchBlogging.org

Golbreich et al describe a formal method of converting OBO to OWL
1.1 files, and vice versa. Their code has been integrated into the OWL
API, a set of classes that is well-used within the OWL community. For
instance, Protege 4 is built on the OWL API. While there have been
other efforts in the past to map between the OBO flat-file format and
OWL (they specifically mention Chris Mungall’s work on an XLST used as
a plugin within Protege that can perform the conversion), none were
done in a formal or rigorous manner. By defining an exact relationship
between OBO and OWL constructs using consensus information provided by
the OBO community, the authors have provided a more robust method of
mapping than has been available to date.

Consequently, the entire library of tools, reasoners and editors
available to the OWL community are now also available to OBO developers
in a way that does not force them to permanently leave the format and
environment that they are used to.

OBO ontologies are ontologies generated within the biological and
biomedical domain and which follow a standard, if often
non-rigorously-defined, syntax and semantics. The most well-known of
the OBO ontologies is the Gene Ontology (GO). Not only do you subscribe
to the format when you choose OBO, you are also subscribing to the
ideas behind the OBO Foundry, which aims to limit overlap of ontologies
in related fields, and which provides a communal environment (mailing
lists, websites, etc) in which to develop. OWL (the Web Ontology
Language) has three dialects, of which OWL-DL (DL stands for
Description Logics) is the most commonly used. OWL-DL is favored by
ontologists wishing to perform computational analyses over ontologies
as it has not just rigorously-defined formal semantics, but also a wide
user-base and a suite of reasoning tools developed by multiple groups.

OBO is composed of stanzas describing elements of the ontology.
Below is an example of a term in its stanza, which describes its
location in the larger ontology:

[Term]
id: GO:0001555
name: oocyte growth
is_a: GO:0016049 ! cell growth
relationship: part_of GO:0048601 ! oocyte morphogenesis
intersection_of: GO:0040007 ! growth
intersection_of: has_central_participant CL:0000023 ! oocyte

Before they could start writing the parsing and mapping programs,
they had to formalize both the semantics and the syntax of OBO. This is
not something that would normally be done by the developers of the
format, not the users of the format, but both the syntax and semantics
of OBO are only defined in natural language. These natural language
definitions often lead to imprecision and, in extreme cases, no
consensus was reached for some of the OBO constructs. However, the
diligence of the authors in getting consensus from the OBO community
should be rewarded in future by the OBO community feeling confident in
the mapping, and therefore also in using the OWL tools now available to
them. An example of natural language defintions in the OBO User Guide
follows:

This tag describes a typed relationship between this term and
another term. […] The necessary modifier allows a relationship to be
marked as “not necessarily true”. […]

Neither “necessarily true” nor relationship have been defined. You
can, in fact, computationally define a relation in three different ways
(taking their stanza example from above):

  • existantially, where each instance of GO:0001555 must have at least
    one part_of relationship to an instance of the term GO:0048601;
  • universally, where instances of GO:0001555 can *only* be connected to instances of GO:0048601;
  • via a constraint interpretation, where the endpoints of the
    relationship *must* be known, but which cannot in any case be expressed
    with DL, so is not useful to this dicussion.

OBO-Edit does not always infer what should be inferred if all of the
rules of its User Guide are followed. There is a good example of this
in the text.In their formal representation of the OBO syntax they used
BNF, which is backwards-compatible with OBO. Many of the mappings are
quite straightforward: OBO terms become OWL classes, OBO relationship
types become OWL properties, OBO instances become OWL individuals, OBO
ids are the URIs in OWL, and the OBO names become the OWL labels. is_a,
disjoint_from, domain and range have direct OWL equivalents. There had
to be some more complex mapping in other places, such as trying to map
OBO relationship types to either OWL object or datatype properties.

Using OWL reasoners over OBO ontologies not only works, but in the
case of the Sequence Ontology (SO), found a term that only had a single
intersection_of statement, and was thus illegal according to OBO rules,
but which hadn’t been found by OBO-Edit.

Up until now, I’ve been unsure as to how the OWL files are created
from files in the OBO format. This was a paper that was clear and to
the point. Thanks very much!

Update December 2008: I originally posted this without the BPR3 /
ResearchBlogging.org tag, as I was unsure where conference proceedings
came in the “peer-reviewed research” part of the guidelines. However,
as I’m now getting back into the whole researchblogging thing, I feel
(having read many of the posts of my fellow research bloggers) that
this would be suitable. If anyone has any opinions, I’d be most
interested!

Golbreich, C., Horridge, M., Horrocks, I., Motik, B., Shearer, R. (2008). OBO and OWL: Leveraging Semantic Web Technologies for the Life Sciences Lecture Notes in Computer Science, 4825/2008, 169-182 DOI: 10.1007/978-3-540-76298-0_13

Read and post comments |
Send to a friend

original

From OBO to OWL and Back Again: OBO capabilities of the OWL API

ResearchBlogging.org

Golbreich et al describe a formal method of converting OBO to OWL 1.1 files, and vice versa. Their code has been integrated into the OWL API, a set of classes that is well-used within the OWL community. For instance, Protege 4 is built on the OWL API. While there have been other efforts in the past to map between the OBO flat-file format and OWL (they specifically mention Chris Mungall’s work on an XLST used as a plugin within Protege that can perform the conversion), none were done in a formal or rigorous manner. By defining an exact relationship between OBO and OWL constructs using consensus information provided by the OBO community, the authors have provided a more robust method of mapping than has been available to date. Consequently, the entire library of tools, reasoners and editors available to the OWL community are now also available to OBO developers in a way that does not force them to permanently leave the format and environment that they are used to.

OBO ontologies are ontologies generated within the biological and biomedical domain and which follow a standard, if often non-rigorously-defined, syntax and semantics. The most well-known of the OBO ontologies is the Gene Ontology (GO). Not only do you subscribe to the format when you choose OBO, you are also subscribing to the ideas behind the OBO Foundry, which aims to limit overlap of ontologies in related fields, and which provides a communal environment (mailing lists, websites, etc) in which to develop. OWL (the Web Ontology Language) has three dialects, of which OWL-DL (DL stands for Description Logics) is the most commonly used. OWL-DL is favored by ontologists wishing to perform computational analyses over ontologies as it has not just rigorously-defined formal semantics, but also a wide user-base and a suite of reasoning tools developed by multiple groups.

OBO is composed of stanzas describing elements of the ontology. Below is an example of a term in its stanza, which describes its location in the larger ontology:

[Term]
id: GO:0001555
name: oocyte growth
is_a: GO:0016049 ! cell growth
relationship: part_of GO:0048601 ! oocyte morphogenesis
intersection_of: GO:0040007 ! growth
intersection_of: has_central_participant CL:0000023 ! oocyte

Before they could start writing the parsing and mapping programs, they had to formalize both the semantics and the syntax of OBO. This is not something that would normally be done by the developers of the format, not the users of the format, but both the syntax and semantics of OBO are only defined in natural language. These natural language definitions often lead to imprecision and, in extreme cases, no consensus was reached for some of the OBO constructs. However, the diligence of the authors in getting consensus from the OBO community should be rewarded in future by the OBO community feeling confident in the mapping, and therefore also in using the OWL tools now available to them. An example of natural language defintions in the OBO User Guide follows:

This tag describes a typed relationship between this term and another term. [...] The necessary modifier allows a relationship to be marked as “not necessarily true”. [...]

Neither “necessarily true” nor relationship have been defined. You can, in fact, computationally define a relation in three different ways (taking their stanza example from above):

  • existantially, where each instance of GO:0001555 must have at least one part_of relationship to an instance of the term GO:0048601;
  • universally, where instances of GO:0001555 can *only* be connected to instances of GO:0048601;
  • via a constraint interpretation, where the endpoints of the relationship *must* be known, but which cannot in any case be expressed with DL, so is not useful to this dicussion.

OBO-Edit does not always infer what should be inferred if all of the rules of its User Guide are followed. There is a good example of this in the text.In their formal representation of the OBO syntax they used BNF, which is backwards-compatible with OBO. Many of the mappings are quite straightforward: OBO terms become OWL classes, OBO relationship types become OWL properties, OBO instances become OWL individuals, OBO ids are the URIs in OWL, and the OBO names become the OWL labels. is_a, disjoint_from, domain and range have direct OWL equivalents. There had to be some more complex mapping in other places, such as trying to map OBO relationship types to either OWL object or datatype properties.

Using OWL reasoners over OBO ontologies not only works, but in the case of the Sequence Ontology (SO), found a term that only had a single intersection_of statement, and was thus illegal according to OBO rules, but which hadn’t been found by OBO-Edit.

Up until now, I’ve been unsure as to how the OWL files are created from files in the OBO format. This was a paper that was clear and to the point. Thanks very much!

Update December 2008: I originally posted this without the BPR3 / ResearchBlogging.org tag, as I was unsure where conference proceedings came in the “peer-reviewed research” part of the guidelines. However, as I’m now getting back into the whole researchblogging thing, I feel (having read many of the posts of my fellow research bloggers) that this would be suitable. If anyone has any opinions, I’d be most interested!

Golbreich, C., Horridge, M., Horrocks, I., Motik, B., Shearer, R. (2008). OBO and OWL: Leveraging Semantic Web Technologies for the Life Sciences Lecture Notes in Computer Science, 4825/2008, 169-182 DOI: 10.1007/978-3-540-76298-0_13