Categories
CISBAN Semantics and Ontologies Software and Tools

SBML in OWL: some thoughts on Model Format OWL (MFO)

What is SBML in OWL?

I’ve created a set of OWL axioms that represent the different parts of the Systems Biology Markup Language (SBML) Level 2 XSD combined with information from the SBML Level 2 Version 4 specification document and from the Systems Biology Ontology (SBO). This OWL file is called Model Format OWL (MFO) (follow that link to find out more information about downloading and manipulating the various files associated with the MFO project). The version I’ve just released is Version 2, as it is much improved on the original version first published at the end of 2007. Broadly, SBML elements have become OWL classes, and SBML attributes have become OWL properties (either datatype or object properties, as appropriate). Then, when actual SBML models are loaded, their data is stored as individuals/instances in an OWL file that can be imported into MFO itself.

A partial overview of the classes (and number of individuals) in MFO.
A partial overview of the classes (and number of individuals) in MFO.

In the past week, I’ve loaded all curated BioModels from the June release into MFO: that’s over 84,000 individuals!1 It takes a few minutes, but it is possible to view all of those files in Protege 3.4 or higher. However, I’m still trying to work out the fastest way to reason over all those individuals at once. Pellet 2.0.0 rc7 performs the slowest over MFO, and FaCT++ the fastest. I’ve got a few more reasoners to try out, too. Details of reasoning times can be found in the MFO subverison project.

Why SBML in OWL?

Jupiter and its biggest moons (not shown to scale). Public Domain, NASA.
Jupiter and its biggest moons (not shown to scale). Public Domain, NASA.

For my PhD, I’ve been working on a semantic data integration. Imagine a planet and its satellites: the planet is your specific domain of biological interest, and the satellites are the data sources you want to pull information from. Then, replace the planet with a core ontology that richly describes your domain of biology in a semantically-meaningful way. Finally, replace each of those satellite data sources with OWL representations, or syntactic ontologies of the format in which your data sources are available. By layering your ontologies like this, you can separate out the process of syntactic integration (the conversion of satellite data into a single format) from the semantic integration, which is the exciting part. Then you can reason over, query, and browse that core ontology without needing to think about the format all that data was once stored in. It’s all presented in a nice, logical package for you to explore. It’s actually very fun. And slowly, very slowly, it’s all coming together.

Really, why SBML in OWL?

As one of my data sources, I’m using BioModels. This is a database of simulatable, biological models whose primary format is SBML. I’m especially interested in BioModels, as the ultimate point of this research is to aid the modellers where I work in annotating and creating new models. In BioModels, the “native” format for the models is SBML, though other formats are available. Because of the importance of SBML in my work, MFO is one of the most important of my syntactic “satellite” ontologies for rule-based mediation.

How a single reaction looks in MFO when viewed with Protege 3.4.
How a single reaction looks in MFO when viewed with Protege 3.4.
How a single species looks in MFO when viewed with Protege 3.4.
How a single species looks in MFO when viewed with Protege 3.4.

Is this all MFO is good for?

No, you don’t need to be interested in data integration to get a kick out of SBML in OWL: just download the MFO software package, pick your favorite BioModels curated model from the src/main/resources/owl/curated-sbml/singletons directory, and have a play with the file in Protege or some other OWL editor. All the details to get you started are available from the MFO website. I’d love to hear what you think about it, and if you have any questions or comments.

MFO is an alternative format for viewing (though not yet simulating) SBML models. It provides logical connections between the various parts of a model. It’s purpose is to be a direct translation of SBML, SBO, and the SBML Specification document in OWL format. Using an editor such as Protege, you can manipulate and create models, and then using the MFO code you can export the completed model back to SBML (while the import feature is complete, the export feature is not yet finished, but will be shortly).

For even more uses of MFO, see the next section.

Why not BioPAX?

All BioModels are available in it, and it’s OWL!

BioPAX Level 3, which isn’t broadly used yet, has a large number of quite interesting features. However, I’m not forgetting about BioPAX: it plays a large role in rule-based mediation for model annotation (more on that in another post, perhaps). It is a generic description of biological pathways and can handle many different types of interactions and pathway types. It’s already in OWL. BioModels exports its models in BioPAX as well as SBML. So, why don’t I just use the BioPAX export? There are a few reasons:

  1. Most importantly, MFO is more than just SBML, and the BioPAX export isn’t. As far as I can tell, the BioModels BioPAX export is a direct conversion from the SBML format. This means it should capture all of the information in an SBML model. But MFO does more than that – it stores logical restrictions and axioms that are only otherwise stored in either SBO itself or, more importantly, the purely human-readable content from the SBML specification document2. Therefore MFO is more than SBML, it is a bunch of extra constraints that aren’t present in the BioPAX version of SBML, and therefore, I need MFO as well as BioPAX.
  2. I’m making all this for modellers, especially those who are still building their models. None of the modellers at CISBAN, where I work, natively use BioPAX. The simulators accept SBML. They develop and test their models in SBML. Therefore I need to be able to fully parse and manipulate SBML models to be able to automatically or semi-automatically add new information to those models.
  3. Export of data from my rule-based mediation project needs to be done in SBML. The end result of my PhD work is a procedure that can create or add annotation to models. Therefore I need to export the newly-integrated data back to SBML. I can use MFO for this, but not BioPAX.
  4. For people familiar with SBML, MFO is a much more accessible view of models than BioPAX. If you wish to start understanding OWL and its benefits, using MFO (if you’re already familiar with SBML) is much easier to get your head around.

What about CellML?

You call MFO “Model” Format OWL, yet it only covers SBML.

Yes, there are other model formats out there. However, as you now know, I have special plans for BioPAX. But there’s also CellML. When I started work on MFO more than a year ago, I did have plans to make a CellML equivalent. However, Sarala Wimalaratne has since done some really nice work on that front. I am currently integrating her work on the CellML Ontology Framework. She’s got a CellML/OWL file that does for CellML what MFO does for SBML. This should allow me to access CellML models in the same way as I can access SBML models, pushing data from both sources into my “planet”-level core ontology.

It’s good times in my small “planet” of semantic data integration for model annotation. I’ll keep you all updated.

Footnotes:

1. Thanks to Michael Hucka for adding the announcement of MFO 2 to the front page of the SBML website!.
2. Of course, not all restrictions and rules present in the SBML specification are present in MFO yet. Some are, though. I’m working on it!

Categories
Meetings & Conferences

BioModels Workshop 2009: Day 2

Today was great fun – lots of presentations and lots of lively discussions, of which we were all a part, but which Nicolas Le Novère ("shown" left, courtesy of Falko Krause 🙂 ) also enjoyed.

Here are the notes!

CellML: Catherine Lloyd

Most of the talk aligned with the talk Catherine gave at BioSysBio 2009 this past week. Some parts were new, however. For instance, she seemed to spend a little more time on versioning. A version is an update of a model entry – usually with a traceable model history. A variant is a slightly different model from the same reference. A variant could be the same model adapted for adifferent cell type. Alternatively, variants of a model may be created to reproduce the different figures from a publication.

libAnnotationSBML: Neil Swainston

Automatic Linking of MIRIAM Annotation to a model using web services. He was involved with the creation of the SBML metabolic yeast network, which had MIRIAM annotations. And now that this qualitative information has been published, they're doing some experiments to get quantitative data. They developed a simple CellDesigner plugin as proof-of-concept to allow the linking of a model to their quantitative data repository (not finished yet).

MIRIAM annotations are a form of tagging the model. However, they want to do more: use the annotations to "reason" over the model. By "reason", they mean doing more than just seeing if the model is annotated: but seeing if the model is being annotated well. Do the reactions balance? Such a question cannot solely be answered by libSBML, and they can use ChEBI to do this. As a human, you would go to the ChEBI entry and get the formula from ChEBI. Then, you can compare that to your reaction. Can this be done automatically?

libAnnotationSBML connects to ChEBI, KEGG, UniProt, MIRIAM. This information is presented in a single convenience class. This stuff has a "SBML Reaction Balance Analyser". They don't do any automatic corrections, but they can identify where something doesn't match with ChEBI. Would like to do it automatically in the near future. Would also like to suggest corrections to existing models (incorrect annotations, missing reactants / products, stoichiometry). Would like to intelligently generate models.

Future: support more web services, write it in C++, or perhaps ask the MIRIAM people to have a web service method that retrieves the URL for the wsdl as well as the human-readable URL. However, connections to web services tend to be inconsistent, and therefore you can't always get the information you want.

semanticSBML: Falko Krause

You can find more information here: http://sysbio.molgen.mpg.de/semanticsbml/. Here there is a standalone GUI which is capable of offline annotation. There is also a web interface.

This is in fact a much more interesting application than is suggested by the notes – mainly I was preoccupied with making sure my talk was ready to go, as it was almost my turn. I highly recommend that you have a look at the link above and have a play with this software.

Saint

I didn't speak directly about Saint, as I will be speaking about MFO instead this afternoon. However, as model annotation was being talked about today, I thought it might be useful for me to put up some information about Saint. The presentation and video will be up on the IET website (but isn't yet). In the meantime, here's a rundown of the purpose of Saint.

The creation of accurate quantitative Systems Biology Markup Language (SBML) models is a time-intensive manual process. Modellers need to know and understand both the systems they are modelling and the intricacies of SBML. However, the amount of relevant data for even a relatively small and well-scoped model is overwhelming. Saint, an automated SBML annotation integration environment, aims to aid the modeller and reduce development time by providing extra information about any given SBML model in an easy-to-use interface. Saint accepts SBML-formatted files and integrates information from multiple databases automatically. Any new information that the user agrees with is then automatically added to the SBML model.

The initial functionality of Saint allows the annotation of already-extant species and suggests additional interactions. The user uploads their SBML model, and the portions of the model recognized by Saint are then displayed using a tabular structure. The user can then remove any items they are not interested in annotating. For instance, some terms such as "sink" are modelling artefacts and do not correspond to genes or proteins. Therefore, the user would normally wish to delete this from the search space to prevent any possible matches with actual biological species of a similar name. Once the user is satisfied with the list of items to be annotated, the model is submitted using the "Annotate Listed Items" button at the bottom of the table. A summary of the annotation returned by Saint is then added to the main table. The user can then remove any new annotation that is unsuitable for their model. At any stage, the user may click on the "Annotated Model" tab in Saint, which adds all new annotation to the original model and presents the new SBML model for viewing and download.

While there are a number of tools available for manipulating and validating SBML (e.g. LibSBML), simulating SBML models (e.g. BASIS and the SBML Toolbox ), and analysing simulations (e.g. COPASI,), and running modelling workflows (e.g. Taverna ), Saint is the first to provide basic automatic annotation of SBML models in an easy-to-use GUI. The purpose of Saint is to aid the researcher in the difficult task of information discovery by seamlessly querying multiple databases and providing the results of that query within the SBML model itself. By providing a modelling interface to existing data integration resources and, modellers are able to add valuable information to models quickly and simply.

Saint already generates reactions and associated new species and species references. It is being extended this creation of reactions to also generate skeleton models based around a species or pathway of interest.

SBO: Nick Juty

The sourceforge website has a tracker as well as access to the whole project. You can browse the whole tree from http://www.ebi.ac.uk/sbo. Your search retrieves a series of tables, and they will retrieve obsolete terms so that you can tell what used to be there. The main curation works happens via a web interface that directly talks to the database (this is just for curation). Lots of web services available.

From SBML to SBGN through SBO: Alice Villeger

Semantic annotations as a bridge between standards. Showed a very nice modification to the SBGN reference card where she colored sections by their SBO branch, which then showed up areas where different branches were used for the same type of notation (and therefore were candidates for modification within SBO). She showed that the SBML info needed is in Species Reference => this can be solved by changing the current SBGN specs. Further, there are some SBO terms that have no direct SBML equivalent (e.g. or, and). She gave a number of other examples, too.

It also seems that the compartment in SBGN and the SBML specification don't match. This is because the SBML compartment is not intended to be the same as the SBGN compartment (a functional versus a physical compartment).

Her analysis of the alignment of SBGN and SBO showed up a number of inconsistencies. This was really useful. There should be some machine-readable expression of SBML x SBO and SBGN x SBO. Further, there aren't many models annotated with
SBO yet. And, if they are, they are not always sufficiently precise. One solution could be a MIRIAM to SBO converter program.

http://arcadiapathways.sourceforge.net

http://biomodels.net/meetings/2009/index.html

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences Standards

Creating, Curating and Computing with CellML (BioSysBio 2009)

Catherine Lloyd et al.
Auckland Bioengineering Institute

CellML is an XML-based markup language which leverage off existing standards (e.g. MathML and RDF). Why is a standard format needed at all? The answer lies in the publishing process. A modeller starts out writing the model in whatever language they want, but then when others want to access the model from a publication, how can they run it or understand it? Also, the writing out of the model as a series of equations or graphics can introduce the possibility of errors. Why not just publish in MATLAB? Why bother putting it in CellML? Well, MATLAB isn't used by everyone. And where it is used, it's a procedural language and distinct from the published paper, which has nothing procedural.

Although they have best-practice standards, there are no requirements. This flexible structure can be used to describe a wide range of types of models: electrophysiology, immunology, cell cycle, muscular contraction, synthetic biology and more. There are some limitations: CellML is good at describing at the molecular and cellular model, but not so good at tissue-scale. However, work is underway on this cross-scale modelling.

CellML is modular structure allowing models to be broken into components. CellML has an import feature that allows you to stick bits of models together, like lego bricks. SBML doesn't have this yet, though it is planned for future versions. This import feature is really useful, and saves time. In CellML models can share entities (e.g. proteins) and processes (e.g. reactions) between models. Imports are also helpful for models with repeating units. For a cell/pacemaker model, a pacemaker unit can be defined once and imported many times.

They have two tools (PCEnv and COR) to help develop CellML models. PCEnv allows development in CellML and then export in other formats such as MATLAB, C, Python etc. PCEnv is windows/linux/mac, COR is windows only. Both tools can also run simulations. PCEnv also shows embedded SVG diagrams of all the models in the repository.

The CellML Model Repository: http://www.cellml.org/models

This repository has over 380 models, all are free for download. The majority are from published paper. For each model entry, there is a short description, curation status, a schematic diagram. Model curation includes model validation and documentation. Of the 380 models, only 4 have been translated straight from the published paper into a working CellML model (i.e. without help from the curation team first). This is because there are often typographical errors in the paper, a lack of unit definitions, missing parameters, missing initial conditions, missing equations etc. At the moment they have a star system. 0 = not curated yet. 1 = maths consistent with published paper. 2 = model's complete and reproduced the results in the published paper. 3 = model satisfies physical constraints, e.g. conservation of mass, momentum, charge etc. Other problems: for some older models we never have access to original code.

There's lots of collaboration with SBML. Currently the diagrams are made manually, and there's no reason why it can't be done automatically, and that's being worked on now. If we want to encourage (via journals) modellers to put their models into SBML or CellML, we need to provide really nice tools and help making the models.

Tuesday Session 2
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original