What is SBML in OWL?
I’ve created a set of OWL axioms that represent the different parts of the Systems Biology Markup Language (SBML) Level 2 XSD combined with information from the SBML Level 2 Version 4 specification document and from the Systems Biology Ontology (SBO). This OWL file is called Model Format OWL (MFO) (follow that link to find out more information about downloading and manipulating the various files associated with the MFO project). The version I’ve just released is Version 2, as it is much improved on the original version first published at the end of 2007. Broadly, SBML elements have become OWL classes, and SBML attributes have become OWL properties (either datatype or object properties, as appropriate). Then, when actual SBML models are loaded, their data is stored as individuals/instances in an OWL file that can be imported into MFO itself.
In the past week, I’ve loaded all curated BioModels from the June release into MFO: that’s over 84,000 individuals!1 It takes a few minutes, but it is possible to view all of those files in Protege 3.4 or higher. However, I’m still trying to work out the fastest way to reason over all those individuals at once. Pellet 2.0.0 rc7 performs the slowest over MFO, and FaCT++ the fastest. I’ve got a few more reasoners to try out, too. Details of reasoning times can be found in the MFO subverison project.
Why SBML in OWL?
For my PhD, I’ve been working on a semantic data integration. Imagine a planet and its satellites: the planet is your specific domain of biological interest, and the satellites are the data sources you want to pull information from. Then, replace the planet with a core ontology that richly describes your domain of biology in a semantically-meaningful way. Finally, replace each of those satellite data sources with OWL representations, or syntactic ontologies of the format in which your data sources are available. By layering your ontologies like this, you can separate out the process of syntactic integration (the conversion of satellite data into a single format) from the semantic integration, which is the exciting part. Then you can reason over, query, and browse that core ontology without needing to think about the format all that data was once stored in. It’s all presented in a nice, logical package for you to explore. It’s actually very fun. And slowly, very slowly, it’s all coming together.
Really, why SBML in OWL?
As one of my data sources, I’m using BioModels. This is a database of simulatable, biological models whose primary format is SBML. I’m especially interested in BioModels, as the ultimate point of this research is to aid the modellers where I work in annotating and creating new models. In BioModels, the “native” format for the models is SBML, though other formats are available. Because of the importance of SBML in my work, MFO is one of the most important of my syntactic “satellite” ontologies for rule-based mediation.
Is this all MFO is good for?
No, you don’t need to be interested in data integration to get a kick out of SBML in OWL: just download the MFO software package, pick your favorite BioModels curated model from the
src/main/resources/owl/curated-sbml/singletons directory, and have a play with the file in Protege or some other OWL editor. All the details to get you started are available from the MFO website. I’d love to hear what you think about it, and if you have any questions or comments.
MFO is an alternative format for viewing (though not yet simulating) SBML models. It provides logical connections between the various parts of a model. It’s purpose is to be a direct translation of SBML, SBO, and the SBML Specification document in OWL format. Using an editor such as Protege, you can manipulate and create models, and then using the MFO code you can export the completed model back to SBML (while the import feature is complete, the export feature is not yet finished, but will be shortly).
For even more uses of MFO, see the next section.
Why not BioPAX?
All BioModels are available in it, and it’s OWL!
BioPAX Level 3, which isn’t broadly used yet, has a large number of quite interesting features. However, I’m not forgetting about BioPAX: it plays a large role in rule-based mediation for model annotation (more on that in another post, perhaps). It is a generic description of biological pathways and can handle many different types of interactions and pathway types. It’s already in OWL. BioModels exports its models in BioPAX as well as SBML. So, why don’t I just use the BioPAX export? There are a few reasons:
- Most importantly, MFO is more than just SBML, and the BioPAX export isn’t. As far as I can tell, the BioModels BioPAX export is a direct conversion from the SBML format. This means it should capture all of the information in an SBML model. But MFO does more than that – it stores logical restrictions and axioms that are only otherwise stored in either SBO itself or, more importantly, the purely human-readable content from the SBML specification document2. Therefore MFO is more than SBML, it is a bunch of extra constraints that aren’t present in the BioPAX version of SBML, and therefore, I need MFO as well as BioPAX.
- I’m making all this for modellers, especially those who are still building their models. None of the modellers at CISBAN, where I work, natively use BioPAX. The simulators accept SBML. They develop and test their models in SBML. Therefore I need to be able to fully parse and manipulate SBML models to be able to automatically or semi-automatically add new information to those models.
- Export of data from my rule-based mediation project needs to be done in SBML. The end result of my PhD work is a procedure that can create or add annotation to models. Therefore I need to export the newly-integrated data back to SBML. I can use MFO for this, but not BioPAX.
- For people familiar with SBML, MFO is a much more accessible view of models than BioPAX. If you wish to start understanding OWL and its benefits, using MFO (if you’re already familiar with SBML) is much easier to get your head around.
What about CellML?
You call MFO “Model” Format OWL, yet it only covers SBML.
Yes, there are other model formats out there. However, as you now know, I have special plans for BioPAX. But there’s also CellML. When I started work on MFO more than a year ago, I did have plans to make a CellML equivalent. However, Sarala Wimalaratne has since done some really nice work on that front. I am currently integrating her work on the CellML Ontology Framework. She’s got a CellML/OWL file that does for CellML what MFO does for SBML. This should allow me to access CellML models in the same way as I can access SBML models, pushing data from both sources into my “planet”-level core ontology.
It’s good times in my small “planet” of semantic data integration for model annotation. I’ll keep you all updated.
1. Thanks to Michael Hucka for adding the announcement of MFO 2 to the front page of the SBML website!.
2. Of course, not all restrictions and rules present in the SBML specification are present in MFO yet. Some are, though. I’m working on it!