Current Research into Reasoning over BioPAX and SBML

What’s going on these days in the world of reasoning and systems biology modelling? What were people’s experiences when trying to reason over systems biology data in BioPAX and/or SBML format? These were the questions that Andrea Splendiani wanted to answer, and so he collected three of us with some experience in the field to give 10-minute presentations to interested parties at a BioPAX telecon. About 15 people turned up for the call, and there were some very interesting talks. I’ll leave you to decide for yourselves if you’d class my presentation as interesting: it was my first talk since getting back from leave, and so I may have been a little rusty!

The first talk was given by Michel Dumontier, and covered some recent work that he and colleagues performed on converting SBML to OWL and reasoning over the resulting entities.

Essentially, with the SBMLHarvester project, the entities in the resulting OWL file can be divided into two broad categories: in silico entites covering the model elements themselves, and in vivo entities covering the (generally biological) subjects the model elements represent. They copied all of BioModels into the OWL format and performed reasoning and analysis over the resulting information. Inconsistencies were found in the annotation of some of the models, and additionally queries can be performed over the resulting data set.

I gave the second talk about my experiences a few years ago converting SBML to OWL using Model Format OWL (MFO) (paper) and then, more recently, using MFO as part of a larger semantic data integration project whose ultimate aim is to annotate systems biology models as well as create skeleton (sub)models.

I first started working on MFO in 2007, and started applying that work to the wider data integration methodology called rule-based mediation (RBM) (paper) in 2009. As with SBMLHarvester, libSBML and the OWLAPI are used in the creation of the OWL files based on BioModels entries. All MFO entries can be reasoned over and constraints present within MFO from the SBML XSD, the SBML Manual, and from SBO do provide some useful checks on converted SBML entries. The semantics of SBMLHarvester are more advanced than that of MFO, however MFO is intended to be a conversion of a format only, so that SWRL mappings can be used to input/output data from MFO to/from the core of the rule-based mediation. Slide 8 of the above presentation provides a graphic of how rule-based mediation works. In summary, you start with a core ontology which should be small and tightly-scoped to your biological domain of interest. Data is fed to the core from multiple syntactic ontologies using SWRL mappings. These syntactic ontologies can be either direct format conversions from other, non-OWL, formats or pre-existing ontologies in their own right. I use BioPAX in this integration work, and while I have mainly reasoned over MFO (and therefore SBML), I do also work with BioPAX and plan to work more with it in the near future.

The final presenter was Ismael Navas Delgado, whose presentation is available from Dropbox. His talk covered two topics: reasoning over BioPAX data taken from Reactome, and the use of a database back-end called DBOWL for the OWL data. By simply performing reasoning over a large number of BioPAX entries, Ismael and colleagues were able to discover not just inconsistencies in the data entries themselves, but also in the structure of BioPAX. It was a very interesting summary of their work, and I highly recommend looking over the slides.

And what is the result of this TC? Andrea has suggested that, after discussion on the mailing list (contact Andrea Splendiani if you are not on it and want to be added) and then have another TC in a couple of weeks. Andrea has also suggested that it would be nice to “setup a task force within this group to prepare a proof of concept of reasoning on BioPAX, across BioPAX/SBML, or across information resources (BioPAX/OMIM…)”. I think that would be a lot of fun. Join us if you do too!

SBML in OWL: some thoughts on Model Format OWL (MFO)

What is SBML in OWL?

I’ve created a set of OWL axioms that represent the different parts of the Systems Biology Markup Language (SBML) Level 2 XSD combined with information from the SBML Level 2 Version 4 specification document and from the Systems Biology Ontology (SBO). This OWL file is called Model Format OWL (MFO) (follow that link to find out more information about downloading and manipulating the various files associated with the MFO project). The version I’ve just released is Version 2, as it is much improved on the original version first published at the end of 2007. Broadly, SBML elements have become OWL classes, and SBML attributes have become OWL properties (either datatype or object properties, as appropriate). Then, when actual SBML models are loaded, their data is stored as individuals/instances in an OWL file that can be imported into MFO itself.

A partial overview of the classes (and number of individuals) in MFO.
A partial overview of the classes (and number of individuals) in MFO.

In the past week, I’ve loaded all curated BioModels from the June release into MFO: that’s over 84,000 individuals!1 It takes a few minutes, but it is possible to view all of those files in Protege 3.4 or higher. However, I’m still trying to work out the fastest way to reason over all those individuals at once. Pellet 2.0.0 rc7 performs the slowest over MFO, and FaCT++ the fastest. I’ve got a few more reasoners to try out, too. Details of reasoning times can be found in the MFO subverison project.

Why SBML in OWL?

Jupiter and its biggest moons (not shown to scale). Public Domain, NASA.
Jupiter and its biggest moons (not shown to scale). Public Domain, NASA.

For my PhD, I’ve been working on a semantic data integration. Imagine a planet and its satellites: the planet is your specific domain of biological interest, and the satellites are the data sources you want to pull information from. Then, replace the planet with a core ontology that richly describes your domain of biology in a semantically-meaningful way. Finally, replace each of those satellite data sources with OWL representations, or syntactic ontologies of the format in which your data sources are available. By layering your ontologies like this, you can separate out the process of syntactic integration (the conversion of satellite data into a single format) from the semantic integration, which is the exciting part. Then you can reason over, query, and browse that core ontology without needing to think about the format all that data was once stored in. It’s all presented in a nice, logical package for you to explore. It’s actually very fun. And slowly, very slowly, it’s all coming together.

Really, why SBML in OWL?

As one of my data sources, I’m using BioModels. This is a database of simulatable, biological models whose primary format is SBML. I’m especially interested in BioModels, as the ultimate point of this research is to aid the modellers where I work in annotating and creating new models. In BioModels, the “native” format for the models is SBML, though other formats are available. Because of the importance of SBML in my work, MFO is one of the most important of my syntactic “satellite” ontologies for rule-based mediation.

How a single reaction looks in MFO when viewed with Protege 3.4.
How a single reaction looks in MFO when viewed with Protege 3.4.
How a single species looks in MFO when viewed with Protege 3.4.
How a single species looks in MFO when viewed with Protege 3.4.

Is this all MFO is good for?

No, you don’t need to be interested in data integration to get a kick out of SBML in OWL: just download the MFO software package, pick your favorite BioModels curated model from the src/main/resources/owl/curated-sbml/singletons directory, and have a play with the file in Protege or some other OWL editor. All the details to get you started are available from the MFO website. I’d love to hear what you think about it, and if you have any questions or comments.

MFO is an alternative format for viewing (though not yet simulating) SBML models. It provides logical connections between the various parts of a model. It’s purpose is to be a direct translation of SBML, SBO, and the SBML Specification document in OWL format. Using an editor such as Protege, you can manipulate and create models, and then using the MFO code you can export the completed model back to SBML (while the import feature is complete, the export feature is not yet finished, but will be shortly).

For even more uses of MFO, see the next section.

Why not BioPAX?

All BioModels are available in it, and it’s OWL!

BioPAX Level 3, which isn’t broadly used yet, has a large number of quite interesting features. However, I’m not forgetting about BioPAX: it plays a large role in rule-based mediation for model annotation (more on that in another post, perhaps). It is a generic description of biological pathways and can handle many different types of interactions and pathway types. It’s already in OWL. BioModels exports its models in BioPAX as well as SBML. So, why don’t I just use the BioPAX export? There are a few reasons:

  1. Most importantly, MFO is more than just SBML, and the BioPAX export isn’t. As far as I can tell, the BioModels BioPAX export is a direct conversion from the SBML format. This means it should capture all of the information in an SBML model. But MFO does more than that – it stores logical restrictions and axioms that are only otherwise stored in either SBO itself or, more importantly, the purely human-readable content from the SBML specification document2. Therefore MFO is more than SBML, it is a bunch of extra constraints that aren’t present in the BioPAX version of SBML, and therefore, I need MFO as well as BioPAX.
  2. I’m making all this for modellers, especially those who are still building their models. None of the modellers at CISBAN, where I work, natively use BioPAX. The simulators accept SBML. They develop and test their models in SBML. Therefore I need to be able to fully parse and manipulate SBML models to be able to automatically or semi-automatically add new information to those models.
  3. Export of data from my rule-based mediation project needs to be done in SBML. The end result of my PhD work is a procedure that can create or add annotation to models. Therefore I need to export the newly-integrated data back to SBML. I can use MFO for this, but not BioPAX.
  4. For people familiar with SBML, MFO is a much more accessible view of models than BioPAX. If you wish to start understanding OWL and its benefits, using MFO (if you’re already familiar with SBML) is much easier to get your head around.

What about CellML?

You call MFO “Model” Format OWL, yet it only covers SBML.

Yes, there are other model formats out there. However, as you now know, I have special plans for BioPAX. But there’s also CellML. When I started work on MFO more than a year ago, I did have plans to make a CellML equivalent. However, Sarala Wimalaratne has since done some really nice work on that front. I am currently integrating her work on the CellML Ontology Framework. She’s got a CellML/OWL file that does for CellML what MFO does for SBML. This should allow me to access CellML models in the same way as I can access SBML models, pushing data from both sources into my “planet”-level core ontology.

It’s good times in my small “planet” of semantic data integration for model annotation. I’ll keep you all updated.

Footnotes:

1. Thanks to Michael Hucka for adding the announcement of MFO 2 to the front page of the SBML website!.
2. Of course, not all restrictions and rules present in the SBML specification are present in MFO yet. Some are, though. I’m working on it!

Abstracting and Generalising the FMA Ontology (ISMB Bio-Ont SIG 2009)

Eleni Mikroyannidi et al.

FMA is very large, but its complete use is time consuming. How can we make it smaller/more manageable without the loss of information? For instance, symmetric classes (left and right hand, foot, etc) are present a lot. So, instead of having 3 concepts (hand, l hand, r hand), just keep the hand concept and then use the “selector pattern” to add the information that hands can be left or right. This abstraction is followed by expansion to the original form, which can also fix apparent omissions at the same time.

This abstraction (which is the way to make it smaller) mechanism was applied to a subset of the FMA with many symmetries, and used the OWL version of FMA from Noy. The steps: user defines the symmetries and the ontology as input; creation of the selector hierachies; detection and abstraction of the symmetrical entities (this last step further includes: a number of preconditions are checked, and then rejected candidates for the pattern are reported in logs); common restrictions are moved in to the parent concept; symmetric concepts are collapsed.

The steps of the expansion algorithm are: detection of classes with existential restrictions referring to the Selector (e.g. has laterality some Laterality); then it creates new symmetrical sibling classes; then create the extistential restrictions of the symmetrical classes based on the restrictions of the parent class.

The FMA shrinks by up to 57% (in the subset they’ve used). In the expansion stage, most concepts are recreated, but there is some loss of restrictions when many symmetries are considered due to ommissions in the FMA. They need to extend the algorithm to reliably track all of the restrictions, especially when a concept refers to more than one symmetry.

FriendFeed discussion: http://ff.im/4wQlC

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

From OBO to OWL and Back Again: OBO capabilities of the OWL API

ResearchBlogging.org

Golbreich et al describe a formal method of converting OBO to OWL
1.1 files, and vice versa. Their code has been integrated into the OWL
API, a set of classes that is well-used within the OWL community. For
instance, Protege 4 is built on the OWL API. While there have been
other efforts in the past to map between the OBO flat-file format and
OWL (they specifically mention Chris Mungall’s work on an XLST used as
a plugin within Protege that can perform the conversion), none were
done in a formal or rigorous manner. By defining an exact relationship
between OBO and OWL constructs using consensus information provided by
the OBO community, the authors have provided a more robust method of
mapping than has been available to date.

Consequently, the entire library of tools, reasoners and editors
available to the OWL community are now also available to OBO developers
in a way that does not force them to permanently leave the format and
environment that they are used to.

OBO ontologies are ontologies generated within the biological and
biomedical domain and which follow a standard, if often
non-rigorously-defined, syntax and semantics. The most well-known of
the OBO ontologies is the Gene Ontology (GO). Not only do you subscribe
to the format when you choose OBO, you are also subscribing to the
ideas behind the OBO Foundry, which aims to limit overlap of ontologies
in related fields, and which provides a communal environment (mailing
lists, websites, etc) in which to develop. OWL (the Web Ontology
Language) has three dialects, of which OWL-DL (DL stands for
Description Logics) is the most commonly used. OWL-DL is favored by
ontologists wishing to perform computational analyses over ontologies
as it has not just rigorously-defined formal semantics, but also a wide
user-base and a suite of reasoning tools developed by multiple groups.

OBO is composed of stanzas describing elements of the ontology.
Below is an example of a term in its stanza, which describes its
location in the larger ontology:

[Term]
id: GO:0001555
name: oocyte growth
is_a: GO:0016049 ! cell growth
relationship: part_of GO:0048601 ! oocyte morphogenesis
intersection_of: GO:0040007 ! growth
intersection_of: has_central_participant CL:0000023 ! oocyte

Before they could start writing the parsing and mapping programs,
they had to formalize both the semantics and the syntax of OBO. This is
not something that would normally be done by the developers of the
format, not the users of the format, but both the syntax and semantics
of OBO are only defined in natural language. These natural language
definitions often lead to imprecision and, in extreme cases, no
consensus was reached for some of the OBO constructs. However, the
diligence of the authors in getting consensus from the OBO community
should be rewarded in future by the OBO community feeling confident in
the mapping, and therefore also in using the OWL tools now available to
them. An example of natural language defintions in the OBO User Guide
follows:

This tag describes a typed relationship between this term and
another term. […] The necessary modifier allows a relationship to be
marked as “not necessarily true”. […]

Neither “necessarily true” nor relationship have been defined. You
can, in fact, computationally define a relation in three different ways
(taking their stanza example from above):

  • existantially, where each instance of GO:0001555 must have at least
    one part_of relationship to an instance of the term GO:0048601;
  • universally, where instances of GO:0001555 can *only* be connected to instances of GO:0048601;
  • via a constraint interpretation, where the endpoints of the
    relationship *must* be known, but which cannot in any case be expressed
    with DL, so is not useful to this dicussion.

OBO-Edit does not always infer what should be inferred if all of the
rules of its User Guide are followed. There is a good example of this
in the text.In their formal representation of the OBO syntax they used
BNF, which is backwards-compatible with OBO. Many of the mappings are
quite straightforward: OBO terms become OWL classes, OBO relationship
types become OWL properties, OBO instances become OWL individuals, OBO
ids are the URIs in OWL, and the OBO names become the OWL labels. is_a,
disjoint_from, domain and range have direct OWL equivalents. There had
to be some more complex mapping in other places, such as trying to map
OBO relationship types to either OWL object or datatype properties.

Using OWL reasoners over OBO ontologies not only works, but in the
case of the Sequence Ontology (SO), found a term that only had a single
intersection_of statement, and was thus illegal according to OBO rules,
but which hadn’t been found by OBO-Edit.

Up until now, I’ve been unsure as to how the OWL files are created
from files in the OBO format. This was a paper that was clear and to
the point. Thanks very much!

Update December 2008: I originally posted this without the BPR3 /
ResearchBlogging.org tag, as I was unsure where conference proceedings
came in the “peer-reviewed research” part of the guidelines. However,
as I’m now getting back into the whole researchblogging thing, I feel
(having read many of the posts of my fellow research bloggers) that
this would be suitable. If anyone has any opinions, I’d be most
interested!

Golbreich, C., Horridge, M., Horrocks, I., Motik, B., Shearer, R. (2008). OBO and OWL: Leveraging Semantic Web Technologies for the Life Sciences Lecture Notes in Computer Science, 4825/2008, 169-182 DOI: 10.1007/978-3-540-76298-0_13

Read and post comments |
Send to a friend

original

From OBO to OWL and Back Again: OBO capabilities of the OWL API

ResearchBlogging.org

Golbreich et al describe a formal method of converting OBO to OWL 1.1 files, and vice versa. Their code has been integrated into the OWL API, a set of classes that is well-used within the OWL community. For instance, Protege 4 is built on the OWL API. While there have been other efforts in the past to map between the OBO flat-file format and OWL (they specifically mention Chris Mungall’s work on an XLST used as a plugin within Protege that can perform the conversion), none were done in a formal or rigorous manner. By defining an exact relationship between OBO and OWL constructs using consensus information provided by the OBO community, the authors have provided a more robust method of mapping than has been available to date. Consequently, the entire library of tools, reasoners and editors available to the OWL community are now also available to OBO developers in a way that does not force them to permanently leave the format and environment that they are used to.

OBO ontologies are ontologies generated within the biological and biomedical domain and which follow a standard, if often non-rigorously-defined, syntax and semantics. The most well-known of the OBO ontologies is the Gene Ontology (GO). Not only do you subscribe to the format when you choose OBO, you are also subscribing to the ideas behind the OBO Foundry, which aims to limit overlap of ontologies in related fields, and which provides a communal environment (mailing lists, websites, etc) in which to develop. OWL (the Web Ontology Language) has three dialects, of which OWL-DL (DL stands for Description Logics) is the most commonly used. OWL-DL is favored by ontologists wishing to perform computational analyses over ontologies as it has not just rigorously-defined formal semantics, but also a wide user-base and a suite of reasoning tools developed by multiple groups.

OBO is composed of stanzas describing elements of the ontology. Below is an example of a term in its stanza, which describes its location in the larger ontology:

[Term]
id: GO:0001555
name: oocyte growth
is_a: GO:0016049 ! cell growth
relationship: part_of GO:0048601 ! oocyte morphogenesis
intersection_of: GO:0040007 ! growth
intersection_of: has_central_participant CL:0000023 ! oocyte

Before they could start writing the parsing and mapping programs, they had to formalize both the semantics and the syntax of OBO. This is not something that would normally be done by the developers of the format, not the users of the format, but both the syntax and semantics of OBO are only defined in natural language. These natural language definitions often lead to imprecision and, in extreme cases, no consensus was reached for some of the OBO constructs. However, the diligence of the authors in getting consensus from the OBO community should be rewarded in future by the OBO community feeling confident in the mapping, and therefore also in using the OWL tools now available to them. An example of natural language defintions in the OBO User Guide follows:

This tag describes a typed relationship between this term and another term. [...] The necessary modifier allows a relationship to be marked as “not necessarily true”. [...]

Neither “necessarily true” nor relationship have been defined. You can, in fact, computationally define a relation in three different ways (taking their stanza example from above):

  • existantially, where each instance of GO:0001555 must have at least one part_of relationship to an instance of the term GO:0048601;
  • universally, where instances of GO:0001555 can *only* be connected to instances of GO:0048601;
  • via a constraint interpretation, where the endpoints of the relationship *must* be known, but which cannot in any case be expressed with DL, so is not useful to this dicussion.

OBO-Edit does not always infer what should be inferred if all of the rules of its User Guide are followed. There is a good example of this in the text.In their formal representation of the OBO syntax they used BNF, which is backwards-compatible with OBO. Many of the mappings are quite straightforward: OBO terms become OWL classes, OBO relationship types become OWL properties, OBO instances become OWL individuals, OBO ids are the URIs in OWL, and the OBO names become the OWL labels. is_a, disjoint_from, domain and range have direct OWL equivalents. There had to be some more complex mapping in other places, such as trying to map OBO relationship types to either OWL object or datatype properties.

Using OWL reasoners over OBO ontologies not only works, but in the case of the Sequence Ontology (SO), found a term that only had a single intersection_of statement, and was thus illegal according to OBO rules, but which hadn’t been found by OBO-Edit.

Up until now, I’ve been unsure as to how the OWL files are created from files in the OBO format. This was a paper that was clear and to the point. Thanks very much!

Update December 2008: I originally posted this without the BPR3 / ResearchBlogging.org tag, as I was unsure where conference proceedings came in the “peer-reviewed research” part of the guidelines. However, as I’m now getting back into the whole researchblogging thing, I feel (having read many of the posts of my fellow research bloggers) that this would be suitable. If anyone has any opinions, I’d be most interested!

Golbreich, C., Horridge, M., Horrocks, I., Motik, B., Shearer, R. (2008). OBO and OWL: Leveraging Semantic Web Technologies for the Life Sciences Lecture Notes in Computer Science, 4825/2008, 169-182 DOI: 10.1007/978-3-540-76298-0_13