From OBO to OWL and Back Again: OBO capabilities of the OWL API
February 20, 2008
Golbreich et al describe a formal method of converting OBO to OWL 1.1 files, and vice versa. Their code has been integrated into the OWL API, a set of classes that is well-used within the OWL community. For instance, Protege 4 is built on the OWL API. While there have been other efforts in the past to map between the OBO flat-file format and OWL (they specifically mention Chris Mungall’s work on an XLST used as a plugin within Protege that can perform the conversion), none were done in a formal or rigorous manner. By defining an exact relationship between OBO and OWL constructs using consensus information provided by the OBO community, the authors have provided a more robust method of mapping than has been available to date. Consequently, the entire library of tools, reasoners and editors available to the OWL community are now also available to OBO developers in a way that does not force them to permanently leave the format and environment that they are used to.
OBO ontologies are ontologies generated within the biological and biomedical domain and which follow a standard, if often non-rigorously-defined, syntax and semantics. The most well-known of the OBO ontologies is the Gene Ontology (GO). Not only do you subscribe to the format when you choose OBO, you are also subscribing to the ideas behind the OBO Foundry, which aims to limit overlap of ontologies in related fields, and which provides a communal environment (mailing lists, websites, etc) in which to develop. OWL (the Web Ontology Language) has three dialects, of which OWL-DL (DL stands for Description Logics) is the most commonly used. OWL-DL is favored by ontologists wishing to perform computational analyses over ontologies as it has not just rigorously-defined formal semantics, but also a wide user-base and a suite of reasoning tools developed by multiple groups.
OBO is composed of stanzas describing elements of the ontology. Below is an example of a term in its stanza, which describes its location in the larger ontology:
[Term]
id: GO:0001555
name: oocyte growth
is_a: GO:0016049 ! cell growth
relationship: part_of GO:0048601 ! oocyte morphogenesis
intersection_of: GO:0040007 ! growth
intersection_of: has_central_participant CL:0000023 ! oocyte
Before they could start writing the parsing and mapping programs, they had to formalize both the semantics and the syntax of OBO. This is not something that would normally be done by the developers of the format, not the users of the format, but both the syntax and semantics of OBO are only defined in natural language. These natural language definitions often lead to imprecision and, in extreme cases, no consensus was reached for some of the OBO constructs. However, the diligence of the authors in getting consensus from the OBO community should be rewarded in future by the OBO community feeling confident in the mapping, and therefore also in using the OWL tools now available to them. An example of natural language defintions in the OBO User Guide follows:
This tag describes a typed relationship between this term and another term. [...] The necessary modifier allows a relationship to be marked as “not necessarily true”. [...]
Neither “necessarily true” nor relationship have been defined. You can, in fact, computationally define a relation in three different ways (taking their stanza example from above):
- existantially, where each instance of GO:0001555 must have at least one part_of relationship to an instance of the term GO:0048601;
- universally, where instances of GO:0001555 can *only* be connected to instances of GO:0048601;
- via a constraint interpretation, where the endpoints of the relationship *must* be known, but which cannot in any case be expressed with DL, so is not useful to this dicussion.
OBO-Edit does not always infer what should be inferred if all of the rules of its User Guide are followed. There is a good example of this in the text.In their formal representation of the OBO syntax they used BNF, which is backwards-compatible with OBO. Many of the mappings are quite straightforward: OBO terms become OWL classes, OBO relationship types become OWL properties, OBO instances become OWL individuals, OBO ids are the URIs in OWL, and the OBO names become the OWL labels. is_a, disjoint_from, domain and range have direct OWL equivalents. There had to be some more complex mapping in other places, such as trying to map OBO relationship types to either OWL object or datatype properties.
Using OWL reasoners over OBO ontologies not only works, but in the case of the Sequence Ontology (SO), found a term that only had a single intersection_of statement, and was thus illegal according to OBO rules, but which hadn’t been found by OBO-Edit.
Up until now, I’ve been unsure as to how the OWL files are created from files in the OBO format. This was a paper that was clear and to the point. Thanks very much!
Update December 2008: I originally posted this without the BPR3 / ResearchBlogging.org tag, as I was unsure where conference proceedings came in the “peer-reviewed research” part of the guidelines. However, as I’m now getting back into the whole researchblogging thing, I feel (having read many of the posts of my fellow research bloggers) that this would be suitable. If anyone has any opinions, I’d be most interested!
[Please note that this is not my main blogging site anymore. I still use it as it is researchblogging-friendly, but it is otherwise completely defunct. Vox, the hoster of my main blog, doesn't play nicely with the RB aggregator software. To keep up with my main blog, please see http://lurena.vox.com, to which I will copy all of these posts anyway.]
Golbreich, C., Horridge, M., Horrocks, I., Motik, B., Shearer, R. (2008). OBO and OWL: Leveraging Semantic Web Technologies for the Life Sciences Lecture Notes in Computer Science, 4825/2008, 169-182 DOI: 10.1007/978-3-540-76298-0_13
3rd OBI Workshop: Day 3
February 1, 2007
Today was a highly informative combination of talks and further improvement of OBI. Hopefully, you'll find these musings on the day's work helpful at either jogging your own memory of the events, or in giving you an idea what went on in our heads.
Outside OBO
Ontologies – How do we integrate and/or make use of them?
-
Can we, at the moment or in future, place
parent classes for all OBO ontologies in OBI? Definitely not now, as they don't share the same ULO (Upper Level Ontology). Some work is being done by the OBO-UBO group on mapping OBO ontologies to ULOs like BFO. (See the OBO-UBO web page for more information)-
In a related question, should all OBO
ontologies use BFO? It would make integration a much more straightforward process. In my opinion, this would be a great idea in the long term, however practicalities may prevent it.
-
-
Should things like
BioTop (http://www.ifomis.uni-saarland.de/biotop/) be integrated
into OBO, under BFO but before OBI? In my opinion (though today was the first time I have read about BioTop so it isn't the most informed one), in our case probably not, as resolving the three may be problematic. However, some terms or ideas might be useful to share.
Formal OWL, aka making OBI Formally correct
-
Should be assigned
to someone/some people for later, after more classes have been
created. There is simply too much flux in the file at the moment. Get the graphs in place first, perhaps working on some
complex relations as you go. Further, the definitions must explicitly hold information
on creating these relations, irrespective of whether or not you make the relationships as you go or at the end. -
BFO and OBI use
different metadata tags, and there should be a
shared set of tags.-
The metadata tags
used in BFO are part of snap/span, I think. Would need to bring up the idea of metadata resolution (if possible, and we all agree it should be pursed) with that group too.
-
-
Barry Smith will bring OBI's information object and plan terms to the BFO group.
-
A milestone has been added (see the OBI Wiki) to
hammer out exact implementation of the metadata list, and to work
with other communities as appropriate (e.g. BFO, OBO Foundry =
Barry, M Ashburner, Suzie, Chris M.).
Clinical Trial
Ontology – Simona Carini & Barry Smith
-
Rctbank is a
clinical trail db – information on all published clinical trials.
(from journal articles) -
Its purpose is to provide enough
information to allow evaluation of these trials -
RCT = randomized
controlled trials
-
Epoch and Clinical
Trial Ontology (CTO) are the other two that are being developed. -
Barry Smith is involved in CTO, and therefore is built with OBI
in mind, but is still very small -
RCT and Epoch
aren’t close to being OBO/OBI compliant.-
Developed
independently -
Their choices are
in conflict with the choices we’ve made - that does NOT mean that they aren't imminently useful (which they are), just that merging would be problematic
-
-
There has been
agreement between Epoch and RCT that all should work towards a CTO
that will work within the OBI framework-
This necessary
reconciliation is one of the goals of the CTO workshop in May.
-
-
There are people
claiming to develop a CTO but it is actually a CT database
ontology (I missed the name of the people being referred to here). It isn’t
the same beast. Understanding the data is not equivalent to
understanding the processes in a trial.
RCT Schema – Barry
Smith
-
Built
independently of OWL or protégé, and is more correctly
a database schema, though it is called an ontology. -
Top-level class:
root-
2o study
-
Trial-details
-
Trial
-
Concept
-
Subclasses
-
-
-
Not the right way
to do it – it is unbalanced: no place for a study, though is a
place for a 2o study. -
2o study seems to
be at the wrong level in the hierarchy -
it is unclear what
trial details means -
When the same term (or portion of a term) is repeated
over and over, it is often the a sign of a mistake, of redundancy -
One of the
children of population concept is population.-
An ontology is
important for reasoning using the is_a hierarchy, which can be reasoned
over: Population is NOT a population concept and is NOT a concept -
Reasoning is
blocked here “from both directions” -
Further, a recruitment
flowchart is not a population concept
-
-
These things, like
population concept, are headers/labels/conveniences, but they are not
ontological forms. Some options for restructuring could be the following two things: -
Population/protocol/design
is_a continuant is_a entity -
Trial is_a
occurrent is_a entity -
Not all RCT terms have
definitions
Epoch Ontology (Dave
Parrish in charge of it) – Barry Smith
-
There are parts of
this ontology that don’t belong in the CTO, but do belong in OBI -
Originally
developed to support the immune tolerance network (ITN), a big
clinical trial resource: they fund, implement, monitor and assess
clinical trials, and provide data services.-
Informatics dept
of ITN perform operations (generation and collection) -> data
management -> analysis
-
-
They have an
ontology of the kind of analytical steps their software needs to
perform, and it helps them configure the software application. -
For example, elements are claimed to be
nouns, and represent the physical objects of the system. Classes of
elements are domain types, containers, relationships. These are not
physical objects always – they’re sometimes processes. Also,
they are not always nouns. -
Fits in with the
community milestones, i.e. we could get many terms from the clinical trials community.
Branches have been assigned. See the OBI Branches Wiki Page for up to date information.
OBO-UBO
-
Mapping between
current terms in various OBO ontologies to BFO-
E.g. GO
biological process is_a span:process
-
-
Gramene has
already developed an environmental ontology in a plant context,
which we should remember and hopefully incorporate useful terms in the first round of community term dates.
More general
discussion
-
Have moved all terms
that would fall under PATO out of the ontology, e.g. state and
anything under quality. -
Do we really need
"in vitro state" as well as "in vitro"? Terms such as
these are always tied to objects like cells – these are not design
as much as the state of the cells. -
Is in vivo
a location or a state? You can take in vitro cells and put
them into “vivo”, and they are still in vitro cells,
which means in vitro is a BFO quality. -
The interior of
your gut is the site for your gut bacteria. The interior of gut (IG)
is also a type/node in the FMA (as a location). IG has qualities
(shape, etc). In addition to these qualities it has others that
determine its roles (having certain pressure, pH value). How to
distinguish what FMA means from what an environment ontology means? -
If we remove
in-vivo_state, we run into problems with multiple inheritance. We
needed to separate out the state of a biomaterial from the
biomaterial itself, i.e. don’t have in-vivo_material as a child of
biomaterial_entity. -
What terms do we
need to use to describe diseases?-
Disease (hook for
disease ontology), disease_symptoms, disease_stages,
disease_course.
-
-
Ended up going through the entire ontology, resolving many problems. There is a new OWL file, but it is not yet ready for public consumption therefore it won't be posted here until it is available from the official OBI pages.
There is general consensus among the workshop attendees that a very large amount of work is getting done, and there is a lot of positive feeling that the Milestones developed this week are giving us hard dates for inclusion of many more terms. The addition of terms can only truly start once the high-level structure has been decided, and this workshop has moved in great leaps and bounds towards a final structure of the higher levels of OBI. The "higher levels" have been generally defined at this meeting as the top two levels of OBI below BFO. This is what was completed today: the two levels directly below BFO have been studied by the group and cleaned.