What is SBML in OWL?

I’ve created a set of OWL axioms that represent the different parts of the Systems Biology Markup Language (SBML) Level 2 XSD combined with information from the SBML Level 2 Version 4 specification document and from the Systems Biology Ontology (SBO). This OWL file is called Model Format OWL (MFO) (follow that link to find out more information about downloading and manipulating the various files associated with the MFO project). The version I’ve just released is Version 2, as it is much improved on the original version first published at the end of 2007. Broadly, SBML elements have become OWL classes, and SBML attributes have become OWL properties (either datatype or object properties, as appropriate). Then, when actual SBML models are loaded, their data is stored as individuals/instances in an OWL file that can be imported into MFO itself.

A partial overview of the classes (and number of individuals) in MFO.

A partial overview of the classes (and number of individuals) in MFO.

In the past week, I’ve loaded all curated BioModels from the June release into MFO: that’s over 84,000 individuals!1 It takes a few minutes, but it is possible to view all of those files in Protege 3.4 or higher. However, I’m still trying to work out the fastest way to reason over all those individuals at once. Pellet 2.0.0 rc7 performs the slowest over MFO, and FaCT++ the fastest. I’ve got a few more reasoners to try out, too. Details of reasoning times can be found in the MFO subverison project.

Why SBML in OWL?

Jupiter and its biggest moons (not shown to scale). Public Domain, NASA.

Jupiter and its biggest moons (not shown to scale). Public Domain, NASA.

For my PhD, I’ve been working on a semantic data integration. Imagine a planet and its satellites: the planet is your specific domain of biological interest, and the satellites are the data sources you want to pull information from. Then, replace the planet with a core ontology that richly describes your domain of biology in a semantically-meaningful way. Finally, replace each of those satellite data sources with OWL representations, or syntactic ontologies of the format in which your data sources are available. By layering your ontologies like this, you can separate out the process of syntactic integration (the conversion of satellite data into a single format) from the semantic integration, which is the exciting part. Then you can reason over, query, and browse that core ontology without needing to think about the format all that data was once stored in. It’s all presented in a nice, logical package for you to explore. It’s actually very fun. And slowly, very slowly, it’s all coming together.

Really, why SBML in OWL?

As one of my data sources, I’m using BioModels. This is a database of simulatable, biological models whose primary format is SBML. I’m especially interested in BioModels, as the ultimate point of this research is to aid the modellers where I work in annotating and creating new models. In BioModels, the “native” format for the models is SBML, though other formats are available. Because of the importance of SBML in my work, MFO is one of the most important of my syntactic “satellite” ontologies for rule-based mediation.

How a single reaction looks in MFO when viewed with Protege 3.4.

How a single reaction looks in MFO when viewed with Protege 3.4.

How a single species looks in MFO when viewed with Protege 3.4.

How a single species looks in MFO when viewed with Protege 3.4.

Is this all MFO is good for?

No, you don’t need to be interested in data integration to get a kick out of SBML in OWL: just download the MFO software package, pick your favorite BioModels curated model from the src/main/resources/owl/curated-sbml/singletons directory, and have a play with the file in Protege or some other OWL editor. All the details to get you started are available from the MFO website. I’d love to hear what you think about it, and if you have any questions or comments.

MFO is an alternative format for viewing (though not yet simulating) SBML models. It provides logical connections between the various parts of a model. It’s purpose is to be a direct translation of SBML, SBO, and the SBML Specification document in OWL format. Using an editor such as Protege, you can manipulate and create models, and then using the MFO code you can export the completed model back to SBML (while the import feature is complete, the export feature is not yet finished, but will be shortly).

For even more uses of MFO, see the next section.

Why not BioPAX?

All BioModels are available in it, and it’s OWL!

BioPAX Level 3, which isn’t broadly used yet, has a large number of quite interesting features. However, I’m not forgetting about BioPAX: it plays a large role in rule-based mediation for model annotation (more on that in another post, perhaps). It is a generic description of biological pathways and can handle many different types of interactions and pathway types. It’s already in OWL. BioModels exports its models in BioPAX as well as SBML. So, why don’t I just use the BioPAX export? There are a few reasons:

  1. Most importantly, MFO is more than just SBML, and the BioPAX export isn’t. As far as I can tell, the BioModels BioPAX export is a direct conversion from the SBML format. This means it should capture all of the information in an SBML model. But MFO does more than that – it stores logical restrictions and axioms that are only otherwise stored in either SBO itself or, more importantly, the purely human-readable content from the SBML specification document2. Therefore MFO is more than SBML, it is a bunch of extra constraints that aren’t present in the BioPAX version of SBML, and therefore, I need MFO as well as BioPAX.
  2. I’m making all this for modellers, especially those who are still building their models. None of the modellers at CISBAN, where I work, natively use BioPAX. The simulators accept SBML. They develop and test their models in SBML. Therefore I need to be able to fully parse and manipulate SBML models to be able to automatically or semi-automatically add new information to those models.
  3. Export of data from my rule-based mediation project needs to be done in SBML. The end result of my PhD work is a procedure that can create or add annotation to models. Therefore I need to export the newly-integrated data back to SBML. I can use MFO for this, but not BioPAX.
  4. For people familiar with SBML, MFO is a much more accessible view of models than BioPAX. If you wish to start understanding OWL and its benefits, using MFO (if you’re already familiar with SBML) is much easier to get your head around.

What about CellML?

You call MFO “Model” Format OWL, yet it only covers SBML.

Yes, there are other model formats out there. However, as you now know, I have special plans for BioPAX. But there’s also CellML. When I started work on MFO more than a year ago, I did have plans to make a CellML equivalent. However, Sarala Wimalaratne has since done some really nice work on that front. I am currently integrating her work on the CellML Ontology Framework. She’s got a CellML/OWL file that does for CellML what MFO does for SBML. This should allow me to access CellML models in the same way as I can access SBML models, pushing data from both sources into my “planet”-level core ontology.

It’s good times in my small “planet” of semantic data integration for model annotation. I’ll keep you all updated.

Footnotes:

1. Thanks to Michael Hucka for adding the announcement of MFO 2 to the front page of the SBML website!.
2. Of course, not all restrictions and rules present in the SBML specification are present in MFO yet. Some are, though. I’m working on it!

Allyson Lister et al.

I didn’t take any notes on this talk, as it was my own talk and I was giving it. However, I can link you out to the paper on Nature Precedings and the Bio-Ontologies programme on the ISMB website. Let me know if you have questions!

You can download the slides for this presentation from SlideShare.

FriendFeed Discussion: http://ff.im/4xtmz

Existing Standards for DNA Description

Guy Cochrane
EBI

For the EMBL database, they need to provide capability for submission and collaborator data exchange. They use SRS for text search and retrieval, simple sequence retrieval (dbfetch), also dump the whole set of files out. There's been a large amount of growth over the past year or so, as the new technologies allow much faster sequencing.

Personal Comment: I took fewer notes for this section as I used to work on TrEMBL (UniProt as it's called now) and am quite familiar with EMBL, so I didn't feel the need to take as many notes…!

Previous Standards Effort: SBML

Herbert Sauro
University of Washington, Seattle

In 1999 there were 5-6 different simulators, and people wanted to be able to move the models from one tool to the next. SBML was originally created to represent homogeneous multi-compartment biochemical systems. They estimate that this format can cover about 80% of the models out there. The initial version was funded by JST. Over 120 software packages now support SBML including MATLAB and Mathematica. SBML is also acceptable to many journals including Nature, Science, and PLoS. It has also since spawned many other initiatives.

Key contributing factors to its take up: a need from the community; availability of detailed documentation; annual/biannual two-day meetings; portable software libraries to enable developers to incorporate standard capabilities into their software; they deliberately didn't try to do everything, as it covered about 80% of the community's needs at the time. Because the libraries were maintained centrally it ensured that the standard didn't diverge, and extensions/modifications were agreed by the community and could then be easily incorporated by developers.

SBML has been going for 8 years. Significant changes are planned. But, the exciting things are the peripheral results: BioModels (repository), KiSAO (ontology/CV), SBO (ontology/CV), TEDDY (ontology/CV), MIASE (presumptive standard for storage of simulation results), SBRML (presumptive standard), Antimony (human-readable version of SBML).

With a standard format, you can all of a sudden do compliance testing – do all applications produce the same results, or even succeed when simulating all models in BioModels? roadRunner, COPASI, BioUML, SBML ODE Solver perform the best.

Physical Standards and the BioBrick Registry

Randy Rettberg

The idea of the registry came from the TTL Data Book for design engineers. The current registry contains a wiki and more – it looks like a website, not a data book. Each biobrick part was listed, and had its own page. The number of teams in 2003 was less than 10 – in 2008 it was 84, with 1180 people.

The quality of the parts is really important. Starting last year, they did a specific set quality control tests. They're making sure that the top 800 bricks grew, had good sequence, the users said they worked, etc.

They also worked on the overall structure of the registry. He'd like to go in the direction of a more distributed system. Future work includes: extension to DAS interface; uploading parts; external tool hooks for sequence analysis and sequence and feature editors.

This session is a preface session for tomorrow's end-of-meeting standards workshop. Beer and pizza!

Tuesday Standards Session
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original