From OBO to OWL and Back Again: OBO capabilities of the OWL API

Golbreich et al describe a formal method of converting OBO to OWL 1.1 files, and vice versa. Their code has been integrated into the OWL API, a set of classes that is well-used within the OWL community. For instance, Protege 4 is built on the OWL API. While there have been other efforts in the past to map between the OBO flat-file format and OWL (they specifically mention Chris Mungall’s work on an XLST used as a plugin within Protege that can perform the conversion), none were done in a formal or rigorous manner. By defining an exact relationship between OBO and OWL constructs using consensus information provided by the OBO community, the authors have provided a more robust method of mapping than has been available to date. Consequently, the entire library of tools, reasoners and editors available to the OWL community are now also available to OBO developers in a way that does not force them to permanently leave the format and environment that they are used to.

OBO ontologies are ontologies generated within the biological and biomedical domain and which follow a standard, if often non-rigorously-defined, syntax and semantics. The most well-known of the OBO ontologies is the Gene Ontology (GO). Not only do you subscribe to the format when you choose OBO, you are also subscribing to the ideas behind the OBO Foundry, which aims to limit overlap of ontologies in related fields, and which provides a communal environment (mailing lists, websites, etc) in which to develop. OWL (the Web Ontology Language) has three dialects, of which OWL-DL (DL stands for Description Logics) is the most commonly used. OWL-DL is favored by ontologists wishing to perform computational analyses over ontologies as it has not just rigorously-defined formal semantics, but also a wide user-base and a suite of reasoning tools developed by multiple groups.

OBO is composed of stanzas describing elements of the ontology. Below is an example of a term in its stanza, which describes its location in the larger ontology:

id: GO:0001555
name: oocyte growth
is_a: GO:0016049 ! cell growth
relationship: part_of GO:0048601 ! oocyte morphogenesis
intersection_of: GO:0040007 ! growth
intersection_of: has_central_participant CL:0000023 ! oocyte

Before they could start writing the parsing and mapping programs, they had to formalize both the semantics and the syntax of OBO. This is not something that would normally be done by the developers of the format, not the users of the format, but both the syntax and semantics of OBO are only defined in natural language. These natural language definitions often lead to imprecision and, in extreme cases, no consensus was reached for some of the OBO constructs. However, the diligence of the authors in getting consensus from the OBO community should be rewarded in future by the OBO community feeling confident in the mapping, and therefore also in using the OWL tools now available to them. An example of natural language defintions in the OBO User Guide follows:

This tag describes a typed relationship between this term and another term. [...] The necessary modifier allows a relationship to be marked as “not necessarily true”. [...]

Neither “necessarily true” nor relationship have been defined. You can, in fact, computationally define a relation in three different ways (taking their stanza example from above):

  • existantially, where each instance of GO:0001555 must have at least one part_of relationship to an instance of the term GO:0048601;
  • universally, where instances of GO:0001555 can *only* be connected to instances of GO:0048601;
  • via a constraint interpretation, where the endpoints of the relationship *must* be known, but which cannot in any case be expressed with DL, so is not useful to this dicussion.

OBO-Edit does not always infer what should be inferred if all of the rules of its User Guide are followed. There is a good example of this in the text.In their formal representation of the OBO syntax they used BNF, which is backwards-compatible with OBO. Many of the mappings are quite straightforward: OBO terms become OWL classes, OBO relationship types become OWL properties, OBO instances become OWL individuals, OBO ids are the URIs in OWL, and the OBO names become the OWL labels. is_a, disjoint_from, domain and range have direct OWL equivalents. There had to be some more complex mapping in other places, such as trying to map OBO relationship types to either OWL object or datatype properties.

Using OWL reasoners over OBO ontologies not only works, but in the case of the Sequence Ontology (SO), found a term that only had a single intersection_of statement, and was thus illegal according to OBO rules, but which hadn’t been found by OBO-Edit.

Up until now, I’ve been unsure as to how the OWL files are created from files in the OBO format. This was a paper that was clear and to the point. Thanks very much!

Update December 2008: I originally posted this without the BPR3 / tag, as I was unsure where conference proceedings came in the “peer-reviewed research” part of the guidelines. However, as I’m now getting back into the whole researchblogging thing, I feel (having read many of the posts of my fellow research bloggers) that this would be suitable. If anyone has any opinions, I’d be most interested!

Golbreich, C., Horridge, M., Horrocks, I., Motik, B., Shearer, R. (2008). OBO and OWL: Leveraging Semantic Web Technologies for the Life Sciences Lecture Notes in Computer Science, 4825/2008, 169-182 DOI: 10.1007/978-3-540-76298-0_13